Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction

Xinshun Wang; Qiongjie Cui; Chen Chen; Shen Zhao; Mengyuan Liu

スケルトンベースの人間の動作予測のためのグラフガイド付き MLP ミキサー

近年、グラフ畳み込みネットワーク (GCN) が人間の動作予測に広く使用されていますが、そのパフォーマンスは依然として満足のいくものではありません。最近では、当初視覚タスク用に開発された MLP-Mixer が、GCN の有望な代替手段として人間の動作予測に活用され、GCN よりも優れたパフォーマンスと効率の両方を達成しています。エッジとノードを含むグラフとして表現することで人間の骨格の骨関節構造を明示的にキャプチャできる GCN とは異なり、MLP-Mixer は完全に接続されたレイヤーに依存しているため、人間のそのようなグラフ状の構造を明示的にモデル化することはできません。 MLP ミキサーのこの制限を打ち破るために、オリジナルの MLP ミキサーアーキテクチャにグラフ構造をモデル化する機能を装備する新しいアプローチであるグラフガイドミキサーを提案します。グラフガイダンスを組み込むことにより、グラフガイドミキサーは人間の骨格のグラフ表現内の特定の接続パターンを効果的にキャプチャして利用できます。この論文では、まず、既存の研究では解明されていない、MLP-Mixer と GCN の間の理論的な関係を明らかにします。この理論的なつながりに基づいて、次に提案するグラフガイド付きミキサーを紹介し、グラフ構造からのガイダンスを組み込むために元の MLP ミキサーアーキテクチャがどのように再発明されるかを説明します。次に、Human3.6M、AMASS、および 3DPW データセットに対して広範な評価を実施し、この手法が最先端のパフォーマンスを達成していることを示しています。

In recent years, Graph Convolutional Networks (GCNs) have been widely used in human motion prediction, but their performance remains unsatisfactory. Recently, MLP-Mixer, initially developed for vision tasks, has been leveraged into human motion prediction as a promising alternative to GCNs, which achieves both better performance and better efficiency than GCNs. Unlike GCNs, which can explicitly capture human skeleton's bone-joint structure by representing it as a graph with edges and nodes, MLP-Mixer relies on fully connected layers and thus cannot explicitly model such graph-like structure of human's. To break this limitation of MLP-Mixer's, we propose Graph-Guided Mixer, a novel approach that equips the original MLP-Mixer architecture with the capability to model graph structure. By incorporating graph guidance, our Graph-Guided Mixer can effectively capture and utilize the specific connectivity patterns within human skeleton's graph representation. In this paper, first we uncover a theoretical connection between MLP-Mixer and GCN that is unexplored in existing research. Building on this theoretical connection, next we present our proposed Graph-Guided Mixer, explaining how the original MLP-Mixer architecture is reinvented to incorporate guidance from graph structure. Then we conduct an extensive evaluation on the Human3.6M, AMASS, and 3DPW datasets, which shows that our method achieves state-of-the-art performance.

updated: Mon Aug 07 2023 07:25:34 GMT+0000 (UTC)

published: Fri Apr 07 2023 08:11:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト