3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking

Shuxiao Ding; Eike Rehder; Lukas Schneider; Marius Cordts; Juergen Gall

3DMOTFormer: オンライン 3D マルチオブジェクトトラッキング用のグラフトランスフォーマー

3D オブジェクトを正確かつ一貫して追跡することは、自動運転車にとって非常に重要であり、軌道予測や動作計画など、より信頼性の高い下流タスクを可能にします。近年の物体検出の大幅な進歩に基づき、検出による追跡パラダイムは、そのシンプルさと効率性により一般的な選択肢となっています。最先端の 3D マルチオブジェクトトラッキング (MOT) アプローチは通常、カルマンフィルターなどの学習されていないモデルベースのアルゴリズムに依存しますが、多くの手動で調整されたパラメーターが必要です。一方で、学習ベースのアプローチは、トレーニングをオンライン設定に適応させるという問題に直面しており、トレーニングと推論の間で避けられない分布の不一致や次善のパフォーマンスにつながります。この研究では、トランスフォーマーアーキテクチャに基づいて構築された学習済みジオメトリベースの 3D MOT フレームワークである 3DMOTFormer を提案します。エッジ拡張グラフトランスフォーマーを使用して、トラック検出の 2 部グラフをフレームごとに推論し、エッジ分類を介してデータの関連付けを実行します。トレーニングと推論の間の分布の不一致を減らすために、自己回帰および再帰フォワードパスと逐次バッチ最適化を備えた新しいオンライントレーニング戦略を提案します。 CenterPoint 検出を使用した私たちのアプローチは、nuScenes 検証とテスト分割でそれぞれ 71.2% と 68.2% の AMOTA を達成しました。さらに、トレーニングされた 3DMOTFormer モデルは、さまざまなオブジェクト検出器にわたって適切に一般化されます。コードはhttps://github.com/dsx0511/3DMOTFormerから入手できます。

Tracking 3D objects accurately and consistently is crucial for autonomous vehicles, enabling more reliable downstream tasks such as trajectory prediction and motion planning. Based on the substantial progress in object detection in recent years, the tracking-by-detection paradigm has become a popular choice due to its simplicity and efficiency. State-of-the-art 3D multi-object tracking (MOT) approaches typically rely on non-learned model-based algorithms such as Kalman Filter but require many manually tuned parameters. On the other hand, learning-based approaches face the problem of adapting the training to the online setting, leading to inevitable distribution mismatch between training and inference as well as suboptimal performance. In this work, we propose 3DMOTFormer, a learned geometry-based 3D MOT framework building upon the transformer architecture. We use an Edge-Augmented Graph Transformer to reason on the track-detection bipartite graph frame-by-frame and conduct data association via edge classification. To reduce the distribution mismatch between training and inference, we propose a novel online training strategy with an autoregressive and recurrent forward pass as well as sequential batch optimization. Using CenterPoint detections, our approach achieves 71.2% and 68.2% AMOTA on the nuScenes validation and test split, respectively. In addition, a trained 3DMOTFormer model generalizes well across different object detectors. Code is available at: https://github.com/dsx0511/3DMOTFormer.

updated: Sat Aug 12 2023 19:19:58 GMT+0000 (UTC)

published: Sat Aug 12 2023 19:19:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト