Recurrent Transformer Variational Autoencoders for Multi-Action Motion Synthesis

Rania Briq; Chuhang Zou; Leonid Pishchulin; Chris Broaddus; Juergen Gall

マルチアクションモーション合成用のリカレントトランス変分オートエンコーダ

任意の長さのマルチアクション人間のモーションシーケンスを合成する問題を検討します。既存のアプローチは、シングルアクションシナリオでモーションシーケンスの生成を習得しましたが、マルチアクションおよび任意の長さのシーケンスに一般化することはできません。リカレントトランスフォーマーの表現力と条件付き変分オートエンコーダーの生成的豊かさを活用する新しい効率的なアプローチを提案することで、このギャップを埋めます。提案された反復アプローチは、線形空間と時間でそれを行いながら、任意の数のアクションとフレームで滑らかで現実的な人間のモーションシーケンスを生成することができます。 PROXおよびCharadesデータセットで提案されたアプローチをトレーニングおよび評価します。ここでは、PROXをグラウンドトゥルースアクションラベルで、Charadesを人間のメッシュ注釈で補強します。実験的評価は、最先端のものと比較して、FIDスコアとセマンティック整合性メトリックの大幅な改善を示しています。

We consider the problem of synthesizing multi-action human motion sequences of arbitrary lengths. Existing approaches have mastered motion sequence generation in single action scenarios, but fail to generalize to multi-action and arbitrary-length sequences. We fill this gap by proposing a novel efficient approach that leverages expressiveness of Recurrent Transformers and generative richness of conditional Variational Autoencoders. The proposed iterative approach is able to generate smooth and realistic human motion sequences with an arbitrary number of actions and frames while doing so in linear space and time. We train and evaluate the proposed approach on PROX and Charades datasets, where we augment PROX with ground-truth action labels and Charades with human mesh annotations. Experimental evaluation shows significant improvements in FID score and semantic consistency metrics compared to the state-of-the-art.

updated: Mon Jun 27 2022 11:45:51 GMT+0000 (UTC)

published: Tue Jun 14 2022 10:40:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト