Fine-Grained Vehicle Perception via 3D Part-Guided Visual Data Augmentation

Feixiang Lu; Zongdai Liu; Hui Miao; Peng Wang; Liangjun Zhang; Ruigang Yang; Dinesh Manocha; Bin Zhou

3Dパーツガイドによる視覚データ拡張によるきめ細かい車両知覚

自律エージェントが世界と対話できるようにするためには、視覚認識モデルを通じてオブジェクトとその3D可動部分を全体的に理解することが不可欠です。自動運転の場合、ドア、トランク、ボンネットなどの車両部品のダイナミクスと状態は、自動運転車の安全性を確保するために不可欠な意味のある意味情報と相互作用状態を提供できます。既存の視覚モデルは、主にオブジェクト境界ボックスの検出やポーズの推定などの大まかな解析に焦点を当てており、これらの状況に取り組むことはめったにありません。このホワイトペーパーでは、視覚的なデータ拡張を使用して2つの重要な問題を解決することにより、自動運転のこの重要な問題に対処します。まず、データ不足に対処するために、動的パーツを備えた3D車モデルを実際の画像の車両に適合させ、次に人と車の相互作用シナリオを再構築することにより、効果的なトレーニングデータ生成プロセスを提案します。これにより、位置合わせされた3Dパーツを使用して実際の画像を直接編集し、堅牢なディープニューラルネットワーク（DNN）を学習するための効果的なトレーニングデータを生成できます。次に、3Dパーツの理解の質をベンチマークするために、一般的でない状態（VUS）の車両、つまりドアやトランクを開いた状態などで、実際の運転シナリオで大規模なデータセットを収集します。 2D検出とインスタンスのセグメンテーションの精度の点で、他のベースラインを上回っています。私たちのネットワークは、これらの珍しいケースの発見と理解に大きな改善をもたらします。さらに、GitHubですべてのソースコード、データセット、トレーニング済みモデルをリリースする予定です。

Holistically understanding an object and its 3D movable parts through visual perception models is essential for enabling an autonomous agent to interact with the world. For autonomous driving, the dynamics and states of vehicle parts such as doors, the trunk, and the bonnet can provide meaningful semantic information and interaction states, which are essential to ensure the safety of the self-driving vehicle. Existing visual perception models mainly focus on coarse parsing such as object bounding box detection or pose estimation and rarely tackle these situations. In this paper, we address this important problem for autonomous driving by solving two critical issues using visual data augmentation. First, to deal with data scarcity, we propose an effective training data generation process by fitting a 3D car model with dynamic parts to vehicles in real images and then reconstructing human-vehicle interaction scenarios. This allows us to directly edit the real images using the aligned 3D parts, yielding effective training data generation for learning robust deep neural networks (DNNs). Second, to benchmark the quality of 3D part understanding, we collect a large dataset in real world driving scenarios with vehicles in uncommon states (VUS), i.e. with the door or trunk opened, etc. Experiments demonstrate our trained network with visual data augmentation largely outperforms other baselines in terms of 2D detection and instance segmentation accuracy. Our network yields large improvements in discovering and understanding these uncommon cases. Moreover, we plan to release all of the source code, the dataset, and the trained model on GitHub.

updated: Tue Dec 15 2020 03:03:38 GMT+0000 (UTC)

published: Tue Dec 15 2020 03:03:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト