Scene Transformer: A unified architecture for predicting multiple agent trajectories

Jiquan Ngiam; Benjamin Caine; Vijay Vasudevan; Zhengdong Zhang; Hao-Tien Lewis Chiang; Jeffrey Ling; Rebecca Roelofs; Alex Bewley; Chenxi Liu; Ashish Venugopal; David Weiss; Ben Sapp; Zhifeng Chen; Jonathon Shlens

Scene Transformer：複数のエージェントの軌跡を予測するための統合アーキテクチャ

動的な環境で計画を立てるには、複数のエージェントの動きを予測する必要があります。エージェント（車両や歩行者など）とそれに関連する行動は多様であり、相互に影響を与える可能性があるため、このタスクは自動運転にとって困難です。これまでのほとんどの作業は、過去のすべての動きに基づいて各エージェントの独立した未来を予測し、これらの独立した予測に照らして計画することに焦点を当てていました。ただし、独立した予測に照らして計画を立てると、異なるエージェント間の将来の相互作用の可能性を表現することが難しくなり、計画が最適化されない可能性があります。この作業では、すべてのエージェントの動作を共同で予測するためのモデルを定式化し、エージェント間の相互作用を説明する一貫した未来を生み出します。最近の言語モデリングアプローチに触発されて、モデルへのクエリとしてマスキング戦略を使用し、単一のモデルを呼び出して、自動運転車の目標や将来の完全な軌道を条件とするなど、さまざまな方法でエージェントの行動を予測できるようにします。環境内の他のエージェントの動作。私たちのモデルアーキテクチャは、道路要素、エージェントの相互作用、およびタイムステップ全体の機能を組み合わせるために注意を払っています。自動運転データセットに対するアプローチを、限界運動と関節運動の両方の予測について評価し、2つの人気のあるデータセット全体で最先端のパフォーマンスを実現します。シーン中心のアプローチ、エージェント順列同変モデル、およびシーケンスマスキング戦略を組み合わせることにより、モデルが関節運動予測から条件付き予測までのさまざまな運動予測タスクを統合できることを示します。

Predicting the motion of multiple agents is necessary for planning in dynamic environments. This task is challenging for autonomous driving since agents (e.g. vehicles and pedestrians) and their associated behaviors may be diverse and influence one another. Most prior work have focused on predicting independent futures for each agent based on all past motion, and planning against these independent predictions. However, planning against independent predictions can make it challenging to represent the future interaction possibilities between different agents, leading to sub-optimal planning. In this work, we formulate a model for predicting the behavior of all agents jointly, producing consistent futures that account for interactions between agents. Inspired by recent language modeling approaches, we use a masking strategy as the query to our model, enabling one to invoke a single model to predict agent behavior in many ways, such as potentially conditioned on the goal or full future trajectory of the autonomous vehicle or the behavior of other agents in the environment. Our model architecture employs attention to combine features across road elements, agent interactions, and time steps. We evaluate our approach on autonomous driving datasets for both marginal and joint motion prediction, and achieve state of the art performance across two popular datasets. Through combining a scene-centric approach, agent permutation equivariant model, and a sequence masking strategy, we show that our model can unify a variety of motion prediction tasks from joint motion predictions to conditioned prediction.

updated: Wed Oct 06 2021 22:24:25 GMT+0000 (UTC)

published: Tue Jun 15 2021 20:20:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト