Progressive Coordinate Transforms for Monocular 3D Object Detection

Li Wang; Li Zhang; Yi Zhu; Zhi Zhang; Tong He; Mu Li; Xiangyang Xue

単眼3Dオブジェクト検出のためのプログレッシブ座標変換

3D空間内のオブジェクトを認識してローカライズすることは、AIエージェントが周囲の環境を認識するための重要な機能です。高価なLiDARポイントクラウドで大きな進歩が達成されましたが、単眼画像のみが与えられた場合の3Dオブジェクト検出には大きな課題があります。この問題に取り組むためのさまざまな選択肢がありますが、RGBと深度情報を融合するための重いネットワークが装備されているか、数百万の疑似LiDARポイントを処理するのに経験的に効果がないことがわかります。詳細な調査により、これらの制限は不正確なオブジェクトのローカリゼーションに根ざしていることがわかります。この論文では、座標表現の学習を容易にするために、プログレッシブ座標変換（PCT）と呼ばれる斬新で軽量なアプローチを提案します。具体的には、信頼性を意識した損失を伴うローカリゼーションブーストメカニズムが導入され、ローカリゼーション予測が段階的に改善されます。さらに、パッチ提案の使用を補うために、セマンティックイメージ表現も活用されます。軽量でシンプルであるにもかかわらず、私たちの戦略は、KITTIおよびWaymo OpenDataset単眼3D検出ベンチマークの優れた改善につながります。同時に、提案されたPCTは、ほとんどの座標ベースの3D検出フレームワークに対して優れた一般化を示しています。コードはhttps://github.com/amazon-research/progressive-coordinate-transformsで入手できます。

Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to progressively refine the localization prediction. In addition, semantic image representation is also exploited to compensate for the usage of patch proposals. Despite being lightweight and simple, our strategy leads to superior improvements on the KITTI and Waymo Open Dataset monocular 3D detection benchmarks. At the same time, our proposed PCT shows great generalization to most coordinate-based 3D detection frameworks. The code is available at: https://github.com/amazon-research/progressive-coordinate-transforms .

updated: Fri Aug 13 2021 07:42:29 GMT+0000 (UTC)

published: Thu Aug 12 2021 15:22:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト