3D Video Object Detection with Learnable Object-Centric Global Optimization

Jiawei He; Yuntao Chen; Naiyan Wang; Zhaoxiang Zhang

学習可能なオブジェクト中心のグローバル最適化による 3D ビデオオブジェクト検出

この作業では、3Dビデオオブジェクト検出のための長期的な時間的視覚対応ベースの最適化を探ります。視覚的対応とは、複数の画像にわたるピクセルの 1 対 1 のマッピングを指します。対応に基づく最適化は 3D シーン再構成の基礎ですが、移動オブジェクトはマルチビュージオメトリの制約に違反し、シーン再構成中に外れ値として扱われるため、3D ビデオオブジェクト検出ではあまり研究されていません。この問題は、対応に基づく最適化中にオブジェクトを第一級市民として扱うことで対処します。この作業では、オブジェクト中心の時間対応学習と特徴量オブジェクトバンドル調整を備えたエンドツーエンドの最適化可能なオブジェクト検出器である BA-Det を提案します。経験的に、さまざまなセットアップの下で複数のベースライン 3D 検出器に対する BA-Det の有効性と効率を検証します。当社の BA-Det は、わずかな計算コストで大規模な Waymo Open Dataset (WOD) で SOTA パフォーマンスを達成します。コードは https://github.com/jiaweihe1996/BA-Det で入手できます。

We explore long-term temporal visual correspondence-based optimization for 3D video object detection in this work. Visual correspondence refers to one-to-one mappings for pixels across multiple images. Correspondence-based optimization is the cornerstone for 3D scene reconstruction but is less studied in 3D video object detection, because moving objects violate multi-view geometry constraints and are treated as outliers during scene reconstruction. We address this issue by treating objects as first-class citizens during correspondence-based optimization. In this work, we propose BA-Det, an end-to-end optimizable object detector with object-centric temporal correspondence learning and featuremetric object bundle adjustment. Empirically, we verify the effectiveness and efficiency of BA-Det for multiple baseline 3D detectors under various setups. Our BA-Det achieves SOTA performance on the large-scale Waymo Open Dataset (WOD) with only marginal computation cost. Our code is available at https://github.com/jiaweihe1996/BA-Det.

updated: Mon Mar 27 2023 17:39:39 GMT+0000 (UTC)

published: Mon Mar 27 2023 17:39:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト