Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data

Matthew Howe; Ian Reid; Jamie Mackenzie

ワイドベースラインマルチビュー交通カメラデータを使用した単眼3Dオブジェクト検出器の弱教師ありトレーニング

交差点での車両の正確な7DoF予測は、道路利用者間の潜在的な衝突を評価するための重要なタスクです。原則として、これは各車両のポーズを検出できる単一のカメラシステムによって実現できますが、これには、検出器をトレーニングするための、正確にラベル付けされた大規模なデータセットが必要になります。大型車両のポーズデータセットは存在しますが（自動運転車用に開発されたようです）、これらのデータセットのトレーニングは不十分です。これらのデータセットには、地上レベルの視点からの画像が含まれていますが、交差点の観測に理想的なビューは、路面よりも高くなっています。交通観測カメラ用の3Dオブジェクト検出器を微調整する弱く監視された方法を使用して代替アプローチを開発します。その過程で、既存の大規模な自動運転車のデータセットを事前トレーニングに活用できることを示しています。単眼3Dオブジェクト検出器を微調整するために、私たちの方法は、重なり合う広いベースラインビューと、下にある幾何学的一貫性をエンコードする損失からの複数の2D検出を利用します。私たちの方法は、自動運転車のデータセットで最高のパフォーマンスを発揮する単眼3Dオブジェクト検出器に匹敵するデータセットでの車両7DoFポーズ予測精度を実現します。トレーニング方法、マルチビュー再投影損失、およびデータセットを紹介します。

Accurate 7DoF prediction of vehicles at an intersection is an important task for assessing potential conflicts between road users. In principle, this could be achieved by a single camera system that is capable of detecting the pose of each vehicle but this would require a large, accurately labelled dataset from which to train the detector. Although large vehicle pose datasets exist (ostensibly developed for autonomous vehicles), we find training on these datasets inadequate. These datasets contain images from a ground level viewpoint, whereas an ideal view for intersection observation would be elevated higher above the road surface. We develop an alternative approach using a weakly supervised method of fine tuning 3D object detectors for traffic observation cameras; showing in the process that large existing autonomous vehicle datasets can be leveraged for pre-training. To fine-tune the monocular 3D object detector, our method utilises multiple 2D detections from overlapping, wide-baseline views and a loss that encodes the subjacent geometric consistency. Our method achieves vehicle 7DoF pose prediction accuracy on our dataset comparable to the top performing monocular 3D object detectors on autonomous vehicle datasets. We present our training methodology, multi-view reprojection loss, and dataset.

updated: Thu Oct 21 2021 08:26:48 GMT+0000 (UTC)

published: Thu Oct 21 2021 08:26:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト