GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting

Alexander Cui; Sergio Casas; Kelvin Wong; Simon Suo; Raquel Urtasun

GoRela: 視点不変の動き予測のための Go Relative

動作予測のタスクは、自動運転車 (SDV) が安全な操縦を計画できるようにするために重要です。この目標に向けて、最新のアプローチでは、正確な予測を生成するために、マップ、エージェントの過去の軌跡、およびそれらの相互作用について推論します。主なアプローチは、マップと他のエージェントを各ターゲットエージェントの参照フレームにエンコードすることでした。ただし、このアプローチは、エージェントごとに推論を実行する必要があるため、マルチエージェント予測では計算コストが高くなります。スケーリングの課題に取り組むために、これまでの解決策は、すべてのエージェントとマップを共有座標フレーム (SDV フレームなど) にエンコードすることでした。ただし、これはサンプルとして非効率的であり、ドメインシフトに対して脆弱です (たとえば、SDV が一般的でない状態を訪れた場合)。対照的に、この論文では、精度や一般化を犠牲にすることなく、すべてのエージェントとマップの効率的な共有エンコーディングを提案します。この目標に向けて、ペアワイズ相対位置エンコーディングを活用して、異種空間グラフ内のエージェントとマップ要素間の幾何学的関係を表します。このパラメーター化により、シーンの視点に対して不変になり、オフラインで計算されたマップ埋め込みを再利用することでオンライン計算を節約できます。また、デコーダーは視点にとらわれず、レーングラフでエージェントの目標を予測して、多様でコンテキストを意識したマルチモーダル予測を可能にします。都市部の Argoverse 2 ベンチマークと新しいハイウェイデータセットで、このアプローチの有効性を実証します。

The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.

updated: Fri Nov 04 2022 16:10:50 GMT+0000 (UTC)

published: Fri Nov 04 2022 16:10:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト