FAC: 3D Representation Learning via Foreground Aware Feature Contrast

Kangcheng Liu; Aoran Xiao; Xiaoqin Zhang; Shijian Lu; Ling Shao

FAC: 前景認識機能コントラストによる 3D 表現学習

対照学習は最近、3D シーン理解タスクにおける教師なし事前トレーニングの大きな可能性を示しました。ただし、ほとんどの既存の作業では、コントラストを構築しながらポイントフィーチャをアンカーとしてランダムに選択するため、3D シーンでしばしば支配的な背景ポイントに明確な偏りが生じます。また、オブジェクトの認識と前景から背景への識別が無視され、対照学習の効果が低下します。これらの問題に取り組むために、事前トレーニングでより効果的な点群表現を学習するための一般的なフォアグラウンド認識機能コントラスト (FAC) フレームワークを提案します。 FAC は、より効果的で有益なコントラストペアを構築するための 2 つの新しいコントラストデザインで構成されています。 1 つ目は、ポイントが同じセマンティクスを持つ傾向がある同じ前景セグメント内で正のペアを構築することです。 2 つ目は、3D セグメント/オブジェクト間の過剰な差別を防ぎ、シャム対応ネットワークでの適応型特徴学習を使用して、セグメントレベルで前景と背景の区別を促進することです。これにより、ポイントクラウドビュー内およびビュー間で特徴の相関関係が効果的に学習されます。ポイントアクティベーションマップを使用した視覚化は、事前トレーニング中にコントラストペアが前景領域間の明確な対応を捉えていることを示しています。定量的実験では、FAC がさまざまな下流の 3D セマンティックセグメンテーションおよびオブジェクト検出タスクで優れた知識伝達とデータ効率を達成することも示されています。

Contrastive learning has recently demonstrated great potential for unsupervised pre-training in 3D scene understanding tasks. However, most existing work randomly selects point features as anchors while building contrast, leading to a clear bias toward background points that often dominate in 3D scenes. Also, object awareness and foreground-to-background discrimination are neglected, making contrastive learning less effective. To tackle these issues, we propose a general foreground-aware feature contrast (FAC) framework to learn more effective point cloud representations in pre-training. FAC consists of two novel contrast designs to construct more effective and informative contrast pairs. The first is building positive pairs within the same foreground segment where points tend to have the same semantics. The second is that we prevent over-discrimination between 3D segments/objects and encourage foreground-to-background distinctions at the segment level with adaptive feature learning in a Siamese correspondence network, which adaptively learns feature correlations within and across point cloud views effectively. Visualization with point activation maps shows that our contrast pairs capture clear correspondences among foreground regions during pre-training. Quantitative experiments also show that FAC achieves superior knowledge transfer and data efficiency in various downstream 3D semantic segmentation and object detection tasks.

updated: Sat Mar 11 2023 11:42:01 GMT+0000 (UTC)

published: Sat Mar 11 2023 11:42:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト