End-to-end Learning Improves Static Object Geo-localization in Monocular Video

Mohamed Chaabane; Lionel Gueguen; Ameni Trabelsi; Ross Beveridge; Stephen O'Hara

エンドツーエンドの学習により、単眼ビデオの静的オブジェクトの地理的位置確認が改善されます

自動運転車の移動カメラから、信号機などの静止物体の位置を正確に推定することは困難な問題です。この作業では、学習を通じてシステムのコンポーネントを共同で最適化することにより、静的オブジェクトのローカリゼーションを改善するシステムを紹介します。私たちのシステムは、以下を実行するネットワークで構成されています：1）1つの画像からの5DoFオブジェクトポーズ推定、2）フレームのペア間のオブジェクトの関連付け、および3）マルチオブジェクトトラッキングによる静的オブジェクトの最終的なローカリゼーションシーン。私たちは、データの可用性に起因する信号機に焦点を当てて、公的に入手可能なデータセットを使用してアプローチを評価します。各コンポーネントについて、現在の代替案と比較し、大幅に改善されたパフォーマンスを示します。また、構成要素モデルの共同トレーニングにより、エンドツーエンドのシステムパフォーマンスがさらに向上することも示しています。

Accurately estimating the position of static objects, such as traffic lights, from the moving camera of a self-driving car is a challenging problem. In this work, we present a system that improves the localization of static objects by jointly-optimizing the components of the system via learning. Our system is comprised of networks that perform: 1) 5DoF object pose estimation from a single image, 2) association of objects between pairs of frames, and 3) multi-object tracking to produce the final geo-localization of the static objects within the scene. We evaluate our approach using a publicly-available data set, focusing on traffic lights due to data availability. For each component, we compare against contemporary alternatives and show significantly-improved performance. We also show that the end-to-end system performance is further improved via joint-training of the constituent models.

updated: Sun Jan 03 2021 17:36:27 GMT+0000 (UTC)

published: Fri Apr 10 2020 21:10:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト