ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

Junru Gu; Chenxu Hu; Tianyuan Zhang; Xuanyao Chen; Yilun Wang; Yue Wang; Hang Zhao

ViP3D: 3D エージェントクエリによるエンドツーエンドのビジュアルトラジェクトリ予測

既存の自動運転システムでは、認識と予測は 2 つの別個のモジュールです。 2 つのモジュールは、インターフェースとしてエージェントボックスや軌跡などの厳選された機能を介して通信します。この分離により、予測モジュールは認識モジュールから限られた情報しか受け取りません。さらに悪いことに、認識モジュールからのエラーが伝播して蓄積し、予測結果に悪影響を与える可能性があります。この作業では、生のビデオからの豊富な情報を活用して、シーン内のエージェントの将来の軌跡を予測する、クエリベースの視覚的な軌跡予測パイプラインである ViP3D を提案します。 ViP3D は、パイプライン全体でスパースエージェントクエリを採用しているため、完全に区別可能で解釈可能です。 nuScenes データセットに関する広範な実験結果は、ViP3D が従来のパイプラインや以前のエンドツーエンドモデルよりも優れたパフォーマンスを発揮することを示しています。

In the existing autonomous driving systems, perception and prediction are two separate modules. The two modules communicate via hand-picked features such as agent boxes and trajectories as interfaces. Due to this separation, the prediction module only receives limited information from the perception module. Even worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to predict future trajectories of agents in a scene. ViP3D employs sparse agent queries throughout the pipeline, making it fully differentiable and interpretable. Extensive experimental results on the nuScenes dataset show the strong performance of ViP3D over traditional pipelines and previous end-to-end models.

updated: Thu Oct 13 2022 17:05:36 GMT+0000 (UTC)

published: Tue Aug 02 2022 16:38:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト