Single-Shot Motion Completion with Transformer

Yinglin Duan; Tianyang Shi; Zhengxia Zou; Yenan Lin; Zhehui Qian; Bohan Zhang; Yi Yuan

トランスフォーマーによるシングルショットモーションの完了

モーションの完了は、挑戦的で長い間議論されてきた問題であり、映画やゲームのアプリケーションで非常に重要です。さまざまなモーション完了シナリオ（中間、充填、およびブレンド）の場合、以前のほとんどの方法は、ケースバイケースの設計での完了の問題を処理します。この作業では、統一されたフレームワークの下で複数のモーション完了問題を解決し、複数の評価設定の下で新しい最先端の精度を達成するためのシンプルで効果的な方法を提案します。注意ベースのモデルの最近の大きな成功に触発されて、我々は完了をシーケンスからシーケンスへの予測問題と見なします。私たちの方法は、2つのモジュールで構成されています。入力モーションの長距離依存性を学習する自己注意を備えた標準のトランスエンコーダーと、時間情報をモデル化してキーフレームを識別するトレーニング可能な混合埋め込みモジュールです。私たちの方法は、非自己回帰的な方法で実行でき、リアルタイムで単一の順伝播内の複数の欠落フレームを予測できます。最後に、音楽ダンスアプリケーションにおける私たちの方法の有効性を示します。

Motion completion is a challenging and long-discussed problem, which is of great significance in film and game applications. For different motion completion scenarios (in-betweening, in-filling, and blending), most previous methods deal with the completion problems with case-by-case designs. In this work, we propose a simple but effective method to solve multiple motion completion problems under a unified framework and achieves a new state of the art accuracy under multiple evaluation settings. Inspired by the recent great success of attention-based models, we consider the completion as a sequence to sequence prediction problem. Our method consists of two modules - a standard transformer encoder with self-attention that learns long-range dependencies of input motions, and a trainable mixture embedding module that models temporal information and discriminates key-frames. Our method can run in a non-autoregressive manner and predict multiple missing frames within a single forward propagation in real time. We finally show the effectiveness of our method in music-dance applications.

updated: Mon Mar 01 2021 06:00:17 GMT+0000 (UTC)

published: Mon Mar 01 2021 06:00:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト