Scatter Points in Space: 3D Detection from Multi-view Monocular Images

Jianlin Liu; Zhuofei Huang; Dihe Huang; Shang Xu; Ying Chen; Yong Liu

空間内の点の散乱: 多視点単眼画像からの 3D 検出

単眼画像からの 3D オブジェクト検出は、コンピュータービジョンの長年にわたる困難な問題です。面倒な 2D インスタンストラッキングを行わずにさまざまな視点からの情報を組み合わせるために、最近の方法では、通常の 3D グリッドを空間内で密にサンプリングしてマルチビューフィーチャを集約する傾向がありますが、これは非効率的です。この論文では、データのスパース性を維持するために、3D 空間に疑似表面点を分散させる学習可能なキーポイントサンプリング法を提案することにより、多視点特徴集約の改善を試みます。次に、マルチビューの幾何学的制約と視覚的特徴によって増強された散乱点を使用して、シーン内のオブジェクトの位置と形状を推測します。単一フレームの制限を補い、マルチビュージオメトリを明示的にモデル化するために、ノイズ抑制用の表面フィルターモジュールをさらに提案します。実験結果は、私たちの方法が 3D 検出に関して以前の研究よりも大幅に優れたパフォーマンスを達成することを示しています (ScanNet のいくつかのカテゴリで 0.1 AP の改善以上)。コードは公開されます。

3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision. To combine information from different perspectives without troublesome 2D instance tracking, recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space, which is inefficient. In this paper, we attempt to improve multi-view feature aggregation by proposing a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity. The scattered points augmented by multi-view geometric constraints and visual features are then employed to infer objects location and shape in the scene. To make up the limitations of single frame and model multi-view geometry explicitly, we further propose a surface filter module for noise suppression. Experimental results show that our method achieves significantly better performance than previous works in terms of 3D detection (more than 0.1 AP improvement on some categories of ScanNet). The code will be publicly available.

updated: Wed Aug 31 2022 09:38:05 GMT+0000 (UTC)

published: Wed Aug 31 2022 09:38:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト