DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer

Buyu Li; Yongchi Zhao; Zhelun Shi; Lu Sheng

DanceFormer: パラメトリックモーショントランスフォーマーを使用した音楽条件付き 3D ダンス生成

音楽から 3D ダンスを生成することは、視覚やグラフィックスの多くのアプリケーションに利益をもたらす新たな研究課題です。これまでの作品ではこのタスクをシーケンス生成として扱っていましたが、運動学的に複雑で一貫した動きを持つ音楽に合わせた長期シーケンスをレンダリングするのは困難でした。この論文では、キーポーズの生成とその中間のパラメトリックモーションカーブ予測という 2 段階のプロセスによってそれを再定式化します。このプロセスでは、キーポーズは音楽ビートと同期しやすく、パラメトリックカーブはより簡単に同期できます。効率的に回帰して、リズムに合わせた流暢な動きを表現します。提案された手法を DanceFormer と名付けました。これには、それぞれの段階に取り組む 2 つのカスケード運動学強化トランスガイドネットワーク (DanTrans と呼ばれます) が含まれています。さらに、再構成やモーションキャプチャではなく、経験豊富なアニメーターによって正確にラベル付けされる、PhantomDance と呼ばれる大規模な音楽条件付き 3D ダンスデータセットを提案します。このデータセットは、ポーズシーケンスとは別に、ダンスをキーポーズおよびパラメトリックモーションカーブとしてエンコードするため、DanceFormer のトレーニングに役立ちます。広範な実験により、提案された方法は、既存のデータセットによってトレーニングされた場合でも、量的および質的に過去の作品を上回る、流暢でパフォーマンス的で音楽にマッチした 3D ダンスを生成できることが実証されています。さらに、提案された DanceFormer は、PhantomDance データセット (https://github.com/libuyu/PhantomDanceDataset) とともに、産業用アニメーションソフトウェアとシームレスに互換性があるため、さまざまな下流アプリケーションへの適応が容易になります。

Generating 3D dances from music is an emerged research task that benefits a lot of applications in vision and graphics. Previous works treat this task as sequence generation, however, it is challenging to render a music-aligned long-term sequence with high kinematic complexity and coherent movements. In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction, where the key poses are easier to be synchronized with the music beats and the parametric curves can be efficiently regressed to render fluent rhythm-aligned movements. We named the proposed method as DanceFormer, which includes two cascading kinematics-enhanced transformer-guided networks (called DanTrans) that tackle each stage, respectively. Furthermore, we propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators rather than reconstruction or motion capture. This dataset also encodes dances as key poses and parametric motion curves apart from pose sequences, thus benefiting the training of our DanceFormer. Extensive experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances that surpass previous works quantitatively and qualitatively. Moreover, the proposed DanceFormer, together with the PhantomDance dataset (https://github.com/libuyu/PhantomDanceDataset), are seamlessly compatible with industrial animation software, thus facilitating the adaptation for various downstream applications.

updated: Thu Jul 27 2023 08:49:55 GMT+0000 (UTC)

published: Thu Mar 18 2021 12:17:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト