Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

Feixiang Lu; Zongdai Liu; Hui Miao; Peng Wang; Liangjun Zhang; Ruigang Yang; Dinesh Manocha; Bin Zhou

3Dパートガイドビジュアルデータ拡張によるきめの細かい車両知覚

自律エージェントが世界と対話できるようにするためには、視覚モデルを通じてオブジェクトとその3D可動部分を全体的に理解することが不可欠です。自動運転の場合、ドア、トランク、ボンネットなどの車両部品のダイナミクスと状態は、自動運転車の安全性を確保するために不可欠な意味のある意味情報と相互作用状態を提供できます。既存の視覚モデルは、主にオブジェクト境界ボックスの検出やポーズの推定などの大まかな解析に焦点を当てており、これらの状況に取り組むことはめったにありません。この論文では、3つの重要な問題を解決することにより、この重要な自動運転の問題に対処します。まず、データ不足に対処するために、人と車の相互作用（VHI）シナリオを再構築する前に、動的パーツを含む3D車モデルを実画像の車両に適合させることによる効果的なトレーニングデータ生成プロセスを提案します。私たちのアプローチは、人間の介入なしに完全に自動化されており、ディープニューラルネットワーク（DNN）をトレーニングするために珍しい状態（VUS）で多数の車両を生成する可能性があります。次に、きめ細かい車両認識を実行するために、VUS解析用のマルチタスクネットワークとVHI解析用のマルチストリームネットワークを紹介します。第3に、データ拡張アプローチの有効性を定量的に評価するために、実際の交通シナリオ（たとえば、荷物の乗り降りや荷物の出し入れ）で最初のVUSデータセットを構築します。実験結果は、私たちのアプローチが2D検出とインスタンスセグメンテーションの他のベースライン方法を大幅に（8％以上）進歩させることを示しています。さらに、私たちのネットワークは、これらの珍しいケースの発見と理解に大きな改善をもたらします。さらに、Github（https://github.com/zongdai/EditingForDNN）でソースコード、データセット、トレーニング済みモデルをリリースしました。

Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensuring the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important autonomous driving problem by solving three critical issues. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images before reconstructing human-vehicle interaction (VHI) scenarios. Our approach is fully automatic without any human interaction, which can generate a large number of vehicles in uncommon states (VUS) for training deep neural networks (DNNs). Second, to perform fine-grained vehicle perception, we present a multi-task network for VUS parsing and a multi-stream network for VHI parsing. Third, to quantitatively evaluate the effectiveness of our data augmentation approach, we build the first VUS dataset in real traffic scenarios (e.g., getting on/out or placing/removing luggage). Experimental results show that our approach advances other baseline methods in 2D detection and instance segmentation by a big margin (over 8%). In addition, our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we have released the source code, the dataset, and the trained model on Github (https://github.com/zongdai/EditingForDNN).

updated: Wed Jan 06 2021 09:04:58 GMT+0000 (UTC)

published: Tue Dec 15 2020 03:03:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト