ParkPredict+: Multimodal Intent and Motion Prediction for Vehicles in Parking Lots with CNN and Transformer

Xu Shen; Matthew Lacayo; Nidhir Guggilla; Francesco Borrelli

ParkPredict+: CNN と Transformer を使用した駐車場内の車両のマルチモーダルインテントおよびモーション予測

駐車場での人間が運転する車両のマルチモーダルな意図と軌道予測の問題が、この論文で扱われています。 CNN と Transformer ネットワークで設計されたモデルを使用して、軌跡履歴とローカルの鳥瞰図 (BEV) セマンティックイメージから時空間情報とコンテキスト情報を抽出し、意図の分布と将来の軌跡シーケンスに関する予測を生成します。私たちの方法は、任意の数のモードを許可し、複雑なマルチエージェントシナリオをエンコードし、さまざまな駐車マップに適応しながら、正確さにおいて既存のモデルよりも優れています。私たちの方法をトレーニングして評価するために、駐車場での人間の運転の最初の公開 4K ビデオデータセットを提示します。正確な注釈、高いフレームレート、豊富な交通量のシナリオを備えています。

The problem of multimodal intent and trajectory prediction for human-driven vehicles in parking lots is addressed in this paper. Using models designed with CNN and Transformer networks, we extract temporal-spatial and contextual information from trajectory history and local bird's eye view (BEV) semantic images, and generate predictions about intent distribution and future trajectory sequences. Our methods outperform existing models in accuracy, while allowing an arbitrary number of modes, encoding complex multi-agent scenarios, and adapting to different parking maps. To train and evaluate our method, we present the first public 4K video dataset of human driving in parking lots with accurate annotation, high frame rate, and rich traffic scenarios.

updated: Tue Jan 10 2023 23:39:42 GMT+0000 (UTC)

published: Sun Apr 17 2022 01:54:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト