Full Point Encoding for Local Feature Aggregation in 3D Point Clouds

Yong He; Hongshan Yu; Zhengeng Yang; Xiaoyan Liu; Wei Sun; Ajmal Mian

3D ポイントクラウドでのローカルフィーチャ集約のフルポイントエンコーディング

点群処理方法は、ローカルフィーチャとグローバルフィーチャ間の内部相関を明示的にモデル化しない集計を通じて、ローカルポイントフィーチャとグローバルコンテキストを利用します。この問題に対処するために、畳み込みおよび変換アーキテクチャに適用可能なフルポイントエンコーディングを提案します。具体的には、Full Point Convolution (FPConv) および Full Point Transformer (FPTransformer) アーキテクチャを提案します。重要なアイデアは、ローカルおよびグローバルの幾何学的接続から重みを適応的に学習することです。接続は、それぞれローカルおよびグローバル相関関数によって確立されます。 FPConv と FPTransformer は、ローカルおよびグローバルの幾何学的関係とそれらの内部相関を同時にモデル化し、強力な一般化能力と高性能を示します。 FPConv は、ローカルおよびグローバルな形状認識学習を実現するために、従来の階層型ネットワークアーキテクチャに組み込まれています。 FPTransformer では、グローバルおよびローカル受容野の各ポイント位置を階層的にエンコードする自己注意のフルポイント位置エンコードを導入します。また、ローカル形状とグローバルコンテキストを考慮した形状認識ダウンサンプリングブロックも提案します。ベンチマークデータセットでの既存の方法との実験的な比較により、セマンティックセグメンテーション、オブジェクト検出、分類、および通常の推定タスクに対する FPConv および FPTransformer の有効性が示されます。特に、S3DIS 6 倍で 76% mIoU、S3DIS Area5 で 72.2% という最先端のセマンティックセグメンテーション結果を達成しています。

Point cloud processing methods exploit local point features and global context through aggregation which does not explicity model the internal correlations between local and global features. To address this problem, we propose full point encoding which is applicable to convolution and transformer architectures. Specifically, we propose Full Point Convolution (FPConv) and Full Point Transformer (FPTransformer) architectures. The key idea is to adaptively learn the weights from local and global geometric connections, where the connections are established through local and global correlation functions respectively. FPConv and FPTransformer simultaneously model the local and global geometric relationships as well as their internal correlations, demonstrating strong generalization ability and high performance. FPConv is incorporated in classical hierarchical network architectures to achieve local and global shape-aware learning. In FPTransformer, we introduce full point position encoding in self-attention, that hierarchically encodes each point position in the global and local receptive field. We also propose a shape aware downsampling block which takes into account the local shape and the global context. Experimental comparison to existing methods on benchmark datasets show the efficacy of FPConv and FPTransformer for semantic segmentation, object detection, classification, and normal estimation tasks. In particular, we achieve state-of-the-art semantic segmentation results of 76% mIoU on S3DIS 6-fold and 72.2% on S3DIS Area5.

updated: Wed Mar 08 2023 09:14:17 GMT+0000 (UTC)

published: Wed Mar 08 2023 09:14:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト