Recursive Cross-View: Use Only 2D Detectors to Achieve 3D Object Detection without 3D Annotations

Shun Gui; Yan Luximon

再帰的クロスビュー: 2D 検出器のみを使用して、3D 注釈なしで 3D オブジェクト検出を実現する

3D 注釈に大きく依存すると、3D オブジェクト検出の実世界への適用が制限されます。この論文では、完全指向の 3D バウンディングボックスを予測できる一方で、3D 注釈を必要としない方法を提案します。 Recursive Cross-View (RCV) と呼ばれる私たちの方法は、3 ビューの原理に基づいて、3D 検出をいくつかの 2D ラベルのみを消費するいくつかの 2D 検出タスクに変換します。 Cross-View によるインスタンスのセグメンテーションと 3D バウンディングボックスの生成が収束するまで再帰的に実装される、再帰的なパラダイムを提案します。具体的には、フラスタムが 2D 検出器を介して提案され、その後に完全指向の 3D ボックス、クラス、およびスコアを最終的に出力する再帰的パラダイムが続きます。私たちの方法が現実世界のシナリオで新しいタスクにすぐに使用できることを正当化するために、屋内 3D 人間検出、完全指向 3D 手の検出、および実際の 3D センサーでのリアルタイム検出という 3 つの実験を行います。 RCV は、これらの実験でまともなパフォーマンスを達成します。トレーニングが完了すると、メソッドを 3D 注釈ツールとして表示できます。その結果、RCV に基づいて 2 つの 3D ラベル付きデータセット、つまり '3D_HUMAN' と 'D_HAND' を作成し、他の 3D 検出器の事前トレーニングに使用できます。さらに、SUN RGB-D ベンチマークで推定すると、私たちの方法は、いくつかの完全な 3D 教師あり学習方法と同等のパフォーマンスを達成します。 RCV は、3D ラベルを消費せず、点群上に完全な方向の 3D ボックスを生成する最初の 3D 検出方法です。

Heavily relying on 3D annotations limits the real-world application of 3D object detection. In this paper, we propose a method that does not demand any 3D annotation, while being able to predict full-oriented 3D bounding boxes. Our method, called Recursive Cross-View (RCV), transforms 3D detection into several 2D detection tasks, which only consume some 2D labels, based on the three-view principle. We propose a recursive paradigm, in which instance segmentation and 3D bounding box generation by Cross-View are implemented recursively until convergence. Specifically, a frustum is proposed via a 2D detector, followed by the recursive paradigm that finally outputs a full-oriented 3D box, class, and score. To justify that our method can be quickly used to new tasks in real-world scenarios, we do three experiments, namely indoor 3D human detection, full-oriented 3D hand detection, and real-time detection on a real 3D sensor. RCV achieves decent performance in these experiments. Once trained, our method can be viewed as a 3D annotation tool. Consequently, we formulate two 3D labeled dataset, namely '3D_HUMAN' and 'D_HAND', based on RCV, which could be used to pre-train other 3D detectors. Furthermore, estimated on the SUN RGB-D benchmark, our method achieves comparable performance with some full 3D supervised learning methods. RCV is the first 3D detection method that does not consume 3D labels and yields full-oriented 3D boxes on point clouds.

updated: Mon Nov 14 2022 04:51:05 GMT+0000 (UTC)

published: Mon Nov 14 2022 04:51:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト