ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

Tuan Duc Ngo; Binh-Son Hua; Khoi Nguyen

ISBNet: インスタンス認識サンプリングとボックス認識動的畳み込みを備えた 3D ポイントクラウドインスタンスセグメンテーションネットワーク

既存の 3D インスタンスセグメンテーション方法は、ボトムアップ設計によって支配されています。ポイントをクラスターにグループ化するために手動で微調整されたアルゴリズムと、それに続くリファインメントネットワークです。ただし、クラスターの品質に依存することにより、これらのメソッドは、(1) 同じセマンティッククラスを持つ近くのオブジェクトが一緒にパックされている場合、または (2) 疎結合領域を持つ大きなオブジェクトが含まれている場合に影響を受けやすい結果を生成します。これらの制限に対処するために、ISBNet を導入します。これは、インスタンスをカーネルとして表し、動的畳み込みを介してインスタンスマスクをデコードする新しいクラスターフリーの方法です。再現率が高く識別力の高いカーネルを効率的に生成するために、Instance-aware Farthest Point Sampling という名前の単純な戦略を提案して、候補をサンプリングし、PointNet++ に触発されたローカルアグリゲーションレイヤーを活用して候補機能をエンコードします。さらに、動的畳み込みで 3D の軸に沿ったバウンディングボックスを予測して活用すると、パフォーマンスがさらに向上することを示します。私たちの方法は、AP に関して ScanNetV2 (55.9)、S3DIS (60.8)、および STPLS3D (49.2) で新しい最先端の結果を設定し、高速な推論時間 (ScanNetV2 でシーンあたり 237ms) を維持します。ソースコードとトレーニング済みモデルは、https://github.com/VinAIResearch/ISBNet で入手できます。

Existing 3D instance segmentation methods are predominated by the bottom-up design -- manually fine-tuned algorithm to group points into clusters followed by a refinement network. However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions. To address these limitations, we introduce ISBNet, a novel cluster-free method that represents instances as kernels and decodes instance masks via dynamic convolution. To efficiently generate high-recall and discriminative kernels, we propose a simple strategy named Instance-aware Farthest Point Sampling to sample candidates and leverage the local aggregation layer inspired by PointNet++ to encode candidate features. Moreover, we show that predicting and leveraging the 3D axis-aligned bounding boxes in the dynamic convolution further boosts performance. Our method set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), and STPLS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2). The source code and trained models are available at https://github.com/VinAIResearch/ISBNet.

updated: Sun Mar 26 2023 15:47:15 GMT+0000 (UTC)

published: Wed Mar 01 2023 06:06:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト