BEV-MODNet: Monocular Camera based Bird's Eye View Moving Object Detection for Autonomous Driving

Hazem Rashed; Mariam Essam; Maha Mohamed; Ahmad El Sallab; Senthil Yogamani

BEV-MODNet：自動運転のための単眼カメラベースの鳥瞰図移動物体検出

自動運転システムでは、移動する物体の検出は非常に重要なタスクです。知覚段階の後、動作計画は通常、鳥瞰図（BEV）空間で実行されます。これには、画像平面で検出されたオブジェクトを上面図のBEV平面に投影する必要があります。このような投影は、深度情報の不足と遠く離れた領域でのノイズの多いマッピングのためにエラーが発生しやすくなります。 CNNは、シーン内のグローバルコンテキストを活用して、より適切に投影できます。この作業では、単眼画像を入力として直接使用して、BEVマップ上のエンドツーエンドの移動物体検出（MOD）を調査します。私たちの知る限り、そのようなデータセットは存在せず、5つのクラスのBEV空間で移動するオブジェクトマスクの注釈が付いた12.9kの画像で構成される拡張KITTI-rawデータセットを作成します。データセットは、クラスにとらわれないモーションキューベースのオブジェクト検出に使用することを目的としており、クラスは、より適切に調整するためのメタデータとして提供されます。モーションセグメンテーションをBEV空間に直接出力する2ストリームRGBおよびオプティカルフローフュージョンアーキテクチャを設計および実装します。これを、画像平面上の最先端のモーションセグメンテーション予測の逆遠近法マッピングと比較します。単純なベースライン実装を使用すると、mIoUが13％大幅に向上することがわかります。これは、BEV空間でモーションセグメンテーション出力を直接学習する機能を示しています。ベースラインとデータセットアノテーションの定性的な結果は、https：//sites.google.com/view/bev-modnetにあります。

Detection of moving objects is a very important task in autonomous driving systems. After the perception phase, motion planning is typically performed in Bird's Eye View (BEV) space. This would require projection of objects detected on the image plane to top view BEV plane. Such a projection is prone to errors due to lack of depth information and noisy mapping in far away areas. CNNs can leverage the global context in the scene to project better. In this work, we explore end-to-end Moving Object Detection (MOD) on the BEV map directly using monocular images as input. To the best of our knowledge, such a dataset does not exist and we create an extended KITTI-raw dataset consisting of 12.9k images with annotations of moving object masks in BEV space for five classes. The dataset is intended to be used for class agnostic motion cue based object detection and classes are provided as meta-data for better tuning. We design and implement a two-stream RGB and optical flow fusion architecture which outputs motion segmentation directly in BEV space. We compare it with inverse perspective mapping of state-of-the-art motion segmentation predictions on the image plane. We observe a significant improvement of 13% in mIoU using the simple baseline implementation. This demonstrates the ability to directly learn motion segmentation output in BEV space. Qualitative results of our baseline and the dataset annotations can be found in https://sites.google.com/view/bev-modnet.

updated: Sun Jul 11 2021 01:11:58 GMT+0000 (UTC)

published: Sun Jul 11 2021 01:11:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト