DMODE: Differential Monocular Object Distance Estimation Module without Class Specific Information

Pedram Agand; Michael Chang; Mo Chen

DMODE: クラス固有の情報を持たない微分単眼物体距離推定モジュール

単一のカメラを使用してオブジェクトの距離を推定すると、ステレオビジョンや LiDAR と比較してコストが削減されます。単眼距離の推定は文献で研究されていますが、以前の方法はほとんどの場合、何らかの方法でオブジェクトのクラスを知ることに依存しています。これにより、マルチクラスオブジェクトや未定義のクラスを持つオブジェクトを含むデータセットのパフォーマンスが低下する可能性があります。このホワイトペーパーでは、クラス固有のアプローチの潜在的な欠点を克服し、そのクラスに関連する情報を必要としない DMODE と呼ばれる代替手法を提供することを目指しています。差分アプローチを使用して、時間の経過に伴うオブジェクトのサイズの変化とカメラの動きを組み合わせて、オブジェクトの距離を推定します。 DMODE はクラスに依存しないメソッドであるため、新しい環境に簡単に適応できます。したがって、さまざまなオブジェクト検出器でパフォーマンスを維持し、新しいオブジェクトクラスに簡単に適応させることができます。 KITTI MOTS データセットのグラウンドトゥルースバウンディングボックス注釈、および TrackRCNN と EagerMOT のバウンディングボックス出力で、トレーニングとテストのさまざまなシナリオでモデルをテストしました。次に、境界ボックスのサイズとカメラ位置の瞬間的な変化を使用して、検出ソースまたはクラスプロパティを測定することなく、3D でオブジェクトの位置を取得します。私たちの結果は、マルチクラスのオブジェクト距離検出を備えたテスト環境で、IPM TuohyIPM、SVR svr、zhu2019learning などの従来の代替方法よりも優れていることを示しています。

Using a single camera to estimate the distances of objects reduces costs compared to stereo-vision and LiDAR. Although monocular distance estimation has been studied in the literature, previous methods mostly rely on knowing an object's class in some way. This can result in deteriorated performance for dataset with multi-class objects and objects with an undefined class. In this paper, we aim to overcome the potential downsides of class-specific approaches, and provide an alternative technique called DMODE that does not require any information relating to its class. Using differential approaches, we combine the changes in an object's size over time together with the camera's motion to estimate the object's distance. Since DMODE is class agnostic method, it is easily adaptable to new environments. Therefore, it is able to maintain performance across different object detectors, and be easily adapted to new object classes. We tested our model across different scenarios of training and testing on the KITTI MOTS dataset's ground-truth bounding box annotations, and bounding box outputs of TrackRCNN and EagerMOT. The instantaneous change of bounding box sizes and camera position are then used to obtain an object's position in 3D without measuring its detection source or class properties. Our results show that we are able to outperform traditional alternatives methods e.g. IPM TuohyIPM, SVR svr, and zhu2019learning in test environments with multi-class object distance detections.

updated: Sun Oct 23 2022 02:06:56 GMT+0000 (UTC)

published: Sun Oct 23 2022 02:06:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト