PGformer: Proxy-Bridged Game Transformer for Multi-Person Extremely Interactive Motion Prediction

Yanwen Fang; Chao Li; Jintai Chen; Pengtao Jiang; Yifeng Geng; Xuansong Xie; Eddy K. F. LAM; Guodong Li

PGformer: 複数人の非常にインタラクティブなモーション予測のためのプロキシブリッジゲームトランスフォーマー

複数人の動きの予測は、特に密に対話する人が存在する現実世界のシナリオでは困難なタスクです。これまでの研究のほとんどは、弱いインタラクション (例: 握手) のケースを研究することに専念しており、通常は人間の各ポーズを個別に予測します。この論文では、極端なコラボレーションを行う複数の人物の動作予測に焦点を当て、高度にインタラクティブな人物の動作軌跡間の関係を調査することを試みます。具体的には、この状況に合わせて調整された 2 つのポーズシーケンス間の相互依存関係を双方向で学習するための、新しいクロスクエリアテンション (XQA) モジュールが提案されています。さらに、関係者間の橋渡しをするプロキシエンティティを導入および構築します。プロキシエンティティは、提案した XQA モジュールと連携して、双方向の情報フローを微妙に制御し、モーション仲介者として機能します。次に、これらの設計を Transformer ベースのアーキテクチャに適応させ、複数人のインタラクティブなモーション予測のためのプロキシブリッジゲーム Transformer (PGformer) と呼ばれる、シンプルかつ効果的なエンドツーエンドフレームワークを考案します。私たちの手法の有効性は、高度にインタラクティブなアクションを伴う、挑戦的な ExPI データセットで評価されています。私たちの PGformer は、短期予測と長期予測の両方において常に最先端の手法を大幅に上回っていることを示しています。さらに、私たちのアプローチは、相互作用が弱い CMU-Mocap および MuPoTS-3D データセットとも互換性があり、有望な結果を達成できます。私たちのコードは承認され次第公開されます。

Multi-person motion prediction is a challenging task, especially for real-world scenarios of densely interacted persons. Most previous works have been devoted to studying the case of weak interactions (e.g., hand-shaking), which typically forecast each human pose in isolation. In this paper, we focus on motion prediction for multiple persons with extreme collaborations and attempt to explore the relationships between the highly interactive persons' motion trajectories. Specifically, a novel cross-query attention (XQA) module is proposed to bilaterally learn the cross-dependencies between the two pose sequences tailored for this situation. Additionally, we introduce and build a proxy entity to bridge the involved persons, which cooperates with our proposed XQA module and subtly controls the bidirectional information flows, acting as a motion intermediary. We then adapt these designs to a Transformer-based architecture and devise a simple yet effective end-to-end framework called proxy-bridged game Transformer (PGformer) for multi-person interactive motion prediction. The effectiveness of our method has been evaluated on the challenging ExPI dataset, which involves highly interactive actions. We show that our PGformer consistently outperforms the state-of-the-art methods in both short- and long-term predictions by a large margin. Besides, our approach can also be compatible with the weakly interacted CMU-Mocap and MuPoTS-3D datasets and achieve encouraging results. Our code will become publicly available upon acceptance.

updated: Tue Jun 06 2023 03:25:09 GMT+0000 (UTC)

published: Tue Jun 06 2023 03:25:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト