SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation

Junjie Zhou; Yongping Xiong; Chinwai Chiu; Fangyu Liu; Xiangyang Gong

SAT: 3D 点群セマンティックセグメンテーション用のサイズ認識トランスフォーマー

変圧器モデルは、点群セグメンテーションで有望なパフォーマンスを達成しています。ただし、ほとんどの既存のアテンションスキームは、すべてのポイントに対して同じ特徴学習パラダイムを等しく提供し、シーンオブジェクト間のサイズの大きな違いを見落としています。この論文では、さまざまなサイズのオブジェクトの有効な受容野を調整できるサイズ認識トランスフォーマー (SAT) を提案します。私たちの SAT は、2 つのステップでサイズ認識学習を実現します。各アテンションレイヤーにマルチスケール機能を導入し、各ポイントがそのアテンションフィールドを適応的に選択できるようにします。これには、Multi-Granularity Attention (MGA) スキームと Re-Attention モジュールの 2 つの主要な設計が含まれています。 MGA は 2 つの課題に対処します。離れた地域からのトークンを効率的に集約することと、1 つのアテンションレイヤー内にマルチスケールフィーチャを保持することです。具体的には、最初の課題に対処するためにポイントボクセルクロスアテンションが提案され、2番目の課題を解決するために標準的なマルチヘッドセルフアテンションに基づくシャント戦略が適用されます。 Re-Attention モジュールは、ポイントごとに MGA によって出力される細粒度および粗粒度の特徴に注意スコアを動的に調整します。広範な実験結果は、SAT が S3DIS および ScanNetV2 データセットで最先端のパフォーマンスを達成することを示しています。私たちのSATは、参照されているすべての方法の中でカテゴリに対して最もバランスの取れたパフォーマンスも達成しています。これは、さまざまなサイズのカテゴリをモデル化することの優位性を示しています。私たちのコードとモデルは、この論文の受理後にリリースされます。

Transformer models have achieved promising performances in point cloud segmentation. However, most existing attention schemes provide the same feature learning paradigm for all points equally and overlook the enormous difference in size among scene objects. In this paper, we propose the Size-Aware Transformer (SAT) that can tailor effective receptive fields for objects of different sizes. Our SAT achieves size-aware learning via two steps: introduce multi-scale features to each attention layer and allow each point to choose its attentive fields adaptively. It contains two key designs: the Multi-Granularity Attention (MGA) scheme and the Re-Attention module. The MGA addresses two challenges: efficiently aggregating tokens from distant areas and preserving multi-scale features within one attention layer. Specifically, point-voxel cross attention is proposed to address the first challenge, and the shunted strategy based on the standard multi-head self attention is applied to solve the second. The Re-Attention module dynamically adjusts the attention scores to the fine- and coarse-grained features output by MGA for each point. Extensive experimental results demonstrate that SAT achieves state-of-the-art performances on S3DIS and ScanNetV2 datasets. Our SAT also achieves the most balanced performance on categories among all referred methods, which illustrates the superiority of modelling categories of different sizes. Our code and model will be released after the acceptance of this paper.

updated: Tue Jan 17 2023 13:25:11 GMT+0000 (UTC)

published: Tue Jan 17 2023 13:25:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト