Shape Prior Non-Uniform Sampling Guided Real-time Stereo 3D Object Detection

Aqi. Gao; Jiale. Cao; Yanwei. Pang

形状事前の不均一なサンプリングガイド付きリアルタイムステレオ3Dオブジェクト検出

疑似LiDARベースの3Dオブジェクト検出器は、その高精度により人気を博しています。ただし、これらの方法では、高密度の深度監視が必要であり、速度が低下します。これらの2つの問題を解決するために、最近導入されたRTS3Dは、深度監視なしでオブジェクトの中間表現のための効率的な4D機能整合性埋め込み（FCE）スペースを構築します。 FCE空間は、オブジェクト領域全体を3D均一グリッド潜在空間に分割して、特徴サンプリングポイントを生成します。これにより、さまざまなオブジェクト領域の重要性が無視されます。ただし、内側の領域と比較して、外側の領域は正確な3D検出にとってより重要な役割を果たしていると私たちは主張します。外側領域からより多くの情報をエンコードするために、外側領域で密なサンプリングを実行し、内側領域で疎なサンプリングを実行する、形状優先の不均一サンプリング戦略を提案します。その結果、外側の領域からより多くのポイントがサンプリングされ、3D検出のためにより多くの有用な特徴が抽出されます。さらに、各サンプリングポイントの特徴識別を強化するために、より多くのコンテキスト情報を活用し、ノイズをより適切に抑制する、高レベルのセマンティック強化FCEモジュールを提案します。 KITTIデータセットの実験は、提案された方法の有効性を示すために実行されます。ベースラインRTS3Dと比較して、提案された方法では、追加のネットワークパラメータがほとんどなくても、AP3dが2.57％向上しています。さらに、提案された方法は、リアルタイムの速度で特別な監視なしに、最先端の方法よりも優れています。

Pseudo-LiDAR based 3D object detectors have gained popularity due to their high accuracy. However, these methods need dense depth supervision and suffer from inferior speed. To solve these two issues, a recently introduced RTS3D builds an efficient 4D Feature-Consistency Embedding (FCE) space for the intermediate representation of object without depth supervision. FCE space splits the entire object region into 3D uniform grid latent space for feature sampling point generation, which ignores the importance of different object regions. However, we argue that, compared with the inner region, the outer region plays a more important role for accurate 3D detection. To encode more information from the outer region, we propose a shape prior non-uniform sampling strategy that performs dense sampling in outer region and sparse sampling in inner region. As a result, more points are sampled from the outer region and more useful features are extracted for 3D detection. Further, to enhance the feature discrimination of each sampling point, we propose a high-level semantic enhanced FCE module to exploit more contextual information and suppress noise better. Experiments on the KITTI dataset are performed to show the effectiveness of the proposed method. Compared with the baseline RTS3D, our proposed method has 2.57% improvement on AP3d almost without extra network parameters. Moreover, our proposed method outperforms the state-of-the-art methods without extra supervision at a real-time speed.

updated: Mon Jun 21 2021 01:55:56 GMT+0000 (UTC)

published: Fri Jun 18 2021 09:14:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト