DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

Buyu Li; Yongchi Zhao; Zhelun Shi; Lu Sheng

DanceFormer：パラメトリックモーショントランスフォーマーを使用した音楽調整済み3Dダンス生成

音楽から3Dダンスを生成することは、視覚やグラフィックスの多くのアプリケーションに利益をもたらす新たな研究課題です。以前の作品では、このタスクをシーケンス生成として扱っていましたが、運動学的に複雑で一貫性のある動きを備えた、音楽に合わせた長期シーケンスをレンダリングすることは困難です。この論文では、キーポーズの生成と中間のパラメトリックモーションカーブ予測という2段階のプロセスで再定式化します。ここで、キーポーズは音楽のビートと同期しやすく、パラメトリックカーブは次のようになります。効率的に後退して、滑らかなリズムに合わせた動きをレンダリングします。提案された方法をDanceFormerと名付けました。これには、各ステージにそれぞれ取り組む2つのカスケード運動学で強化された変圧器誘導ネットワーク（DanTransと呼ばれる）が含まれます。さらに、再構成やモーションキャプチャではなく、経験豊富なアニメーターによって正確にラベル付けされた、PhantomDanceと呼ばれる大規模な音楽条件付き3Dダンスデータセットを提案します。このデータセットは、ポーズシーケンスとは別に、ダンスを主要なポーズおよびパラメトリックモーションカーブとしてエンコードするため、DanceFormerのトレーニングに役立ちます。広範な実験は、提案された方法が、既存のデータセットによって訓練されたとしても、以前の作品を量的および質的に凌駕する流暢で遂行的で音楽にマッチした3Dダンスを生み出すことができることを示しています。さらに、提案されたDanceFormerは、PhantomDanceデータセットとともに、産業用アニメーションソフトウェアとシームレスに互換性があるため、さまざまなダウンストリームアプリケーションへの適応が容易になります。

Generating 3D dances from music is an emerged research task that benefits a lot of applications in vision and graphics. Previous works treat this task as sequence generation, however, it is challenging to render a music-aligned long-term sequence with high kinematic complexity and coherent movements. In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction, where the key poses are easier to be synchronized with the music beats and the parametric curves can be efficiently regressed to render fluent rhythm-aligned movements. We named the proposed method as DanceFormer, which includes two cascading kinematics-enhanced transformer-guided networks (called DanTrans) that tackle each stage, respectively. Furthermore, we propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators rather than reconstruction or motion capture. This dataset also encodes dances as key poses and parametric motion curves apart from pose sequences, thus benefiting the training of our DanceFormer. Extensive experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances that surpass previous works quantitatively and qualitatively. Moreover, the proposed DanceFormer, together with the PhantomDance dataset, are seamlessly compatible with industrial animation software, thus facilitating the adaptation for various downstream applications.

updated: Wed Dec 08 2021 12:15:36 GMT+0000 (UTC)

published: Thu Mar 18 2021 12:17:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト