Attention-based Proposals Refinement for 3D Object Detection

Minh-Quan Dao; Elwan Héry; Vincent Frémont

3Dオブジェクト検出のための注意ベースの提案の改良

安全な自動運転技術は、予測やナビゲーションなどのセーフティクリティカルなダウンストリームタスクへの入力を生成するため、正確な3Dオブジェクト検出に大きく依存しています。この分野における最近の進歩は、精度と効率のバランスをより良くするために、ボクセルベースの領域提案ネットワークの改良段階を開発することによってなされています。最先端のフレームワークで人気のあるアプローチは、提案または関心領域（ROI）をグリッドに分割し、グリッド位置ごとに特徴を抽出してから、それらをROI機能に合成することです。印象的なパフォーマンスを達成する一方で、そのようなアプローチには、専門家の知識を正しく調整する必要がある多くの手作りのコンポーネント（グリッドサンプリング、セットの抽象化など）が含まれます。このホワイトペーパーでは、アテンションメカニズムを使用したROI特徴抽出に対してよりデータ駆動型のアプローチを採用しています。具体的には、ROI内のポイントは、ROIのジオメトリを組み込むために位置的にエンコードされます。結果として得られる位置エンコーディングとその特徴は、ベクトルアテンションを介してROI特徴に変換されます。元のマルチヘッドアテンションとは異なり、ベクトルアテンションはポイントフィーチャ内の異なるチャネルに異なる重みを割り当てるため、プールされたポイントとROIの間のより高度な関係をキャプチャできます。 KITTI検証セットでの実験は、私たちの方法が、密接に関連する方法と比較して最小のパラメーターを持ち、NVIDIA V100GPUで15FPSの準リアルタイム推論速度を達成しながら、中程度の難易度でクラスCarに対して84.84APの競争力のあるパフォーマンスを達成することを示しています。コードがリリースされます。

Safe autonomous driving technology heavily depends on accurate 3D object detection since it produces input to safety critical downstream tasks such as prediction and navigation. Recent advances in this field is made by developing the refinement stage for voxel-based region proposal networks to better strike the balance between accuracy and efficiency. A popular approach among state-of-the-art frameworks is to divide proposals, or Region of Interest (ROI), into grids and extract feature for each grid location before synthesizing them to ROI feature. While achieving impressive performances, such an approach involves a number of hand crafted components (e.g. grid sampling, set abstraction) which requires expert knowledge to be tuned correctly. This paper takes a more data-driven approach to ROI feature extraction using the attention mechanism. Specifically, points inside a ROI are positionally encoded to incorporate ROI 's geometry. The resulted position encoding and their features are transformed into ROI feature via vector attention. Unlike the original multi-head attention, vector attention assign different weights to different channels within a point feature, thus being able to capture a more sophisticated relation between pooled points and ROI. Experiments on KITTI validation set show that our method achieves competitive performance of 84.84 AP for class Car at Moderate difficulty while having the least parameters compared to closely related methods and attaining a quasi-real time inference speed at 15 FPS on NVIDIA V100 GPU. The code will be released.

updated: Tue Jan 18 2022 15:50:31 GMT+0000 (UTC)

published: Tue Jan 18 2022 15:50:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト