Monocular 3D Object Detection with Sequential Feature Association and Depth Hint Augmentation

Tianze Gao; Huihui Pan; Huijun Gao

シーケンシャルフィーチャアソシエーションと深度ヒント拡張を使用した単眼3Dオブジェクト検出

単眼3Dオブジェクト検出は、その挑戦的な性質と幅広い適用の見通しのために活発な研究トピックです。この作業では、FADNetという名前の単一ステージのキーポイントベースのネットワークを提示して、自動運転をターゲットアプリケーションとする単眼3Dオブジェクト検出のタスクに対処します。出力ブランチに同一のレイアウトを採用する以前のキーポイントベースの方法とは対照的に、推定の難しさに応じて出力モダリティを異なるグループに分割することを提案します。これにより、異なるグループの扱いが異なります。この目的のために、畳み込みゲート付き回帰ユニット（convGRU）がネットワークに組み込まれ、さまざまなグループの畳み込み機能間で順次機能を関連付けることができます。シーケンシャル機能の関連付けの目的は、簡単な推定のガイダンスの下で、難しい推定の精度を向上させることです。このような設計が、2D推定と3D推定の間の幾何学的一貫性に寄与することも観察されています。この作業のもう1つの貢献は、深度ヒント拡張の戦略です。深さ推定のヒントとして特徴付けられた深さパターンを提供するために、専用の深さヒントモジュールは、深さヒントと呼ばれる行方向の特徴を生成するように設計されています。これは、ビン単位で明示的に監視されます。トレーニング段階では、回帰出力が均一にエンコードされ、損失のもつれを解くことができます。 2D損失項は、小さなオブジェクトの検出精度を向上させるために、深度を認識するようにさらに適合されています。この作業の貢献は、KITTI3Dベンチマークで実験とアブレーション研究を実施することによって検証されます。デプスプライア、ポスト最適化、またはその他の改良モジュールを利用することなく、当社のネットワークは、適切な実行速度を維持しながら、最先端の方法と同等のパフォーマンスを実現します。コードはhttps://github.com/gtzly/FADNetで入手できます。

Monocular 3D object detection is an active research topic due to its challenging nature and broad applying prospects. In this work, a single-stage keypoint-based network, named as FADNet, is presented to address the task of monocular 3D object detection with autonomous driving as the target application. In contrast to previous keypoint-based methods which adopt identical layouts for output branches, we propose to divide the output modalities into different groups according to the estimating difficulty, whereby different groups are treated differently. To this end, a convolutional gated recurrent unit (convGRU) is embedded into our network to enable sequential feature association across the convolutional features in different groups. The purpose of sequential feature association is to improve the accuracy of harder estimations under the guidance of easier ones. It is also observed that such design contributes to the geometric consistency between 2D and 3D estimations. Another contribution of this work is the strategy of depth hint augmentation. To provide characterized depth patterns as hints for depth estimation, a dedicated depth hint module is designed to generate row-wise features named as depth hints, which are explicitly supervised in a bin-wise manner. In the training stage, the regression outputs are uniformly encoded to enable loss disentanglement. The 2D loss term is further adapted to be depth-aware for improving the detection accuracy of small objects. The contributions of this work are validated by conducting experiments and ablation study on the KITTI3D benchmark. Without utilizing depth priors, post optimization, or other refinement modules, our network achieves the performance on par with state-of-the-art methods while maintaining a decent running speed. The code is available at https://github.com/gtzly/FADNet.

updated: Mon Nov 30 2020 07:19:14 GMT+0000 (UTC)

published: Mon Nov 30 2020 07:19:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト