Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Ruozi Huang; Huang Hu; Wei Wu; Kei Sawada; Mi Zhang; Daxin Jiang

ダンス革命：カリキュラム学習による音楽による長期的なダンスの生成

音楽に合わせて踊ることは、古くから人間の生来の能力の1つです。ただし、機械学習の研究では、音楽からダンスの動きを合成することは難しい問題です。最近、研究者はリカレントニューラルネットワーク（RNN）のような自己回帰モデルを介して人間のモーションシーケンスを合成します。このようなアプローチでは、ニューラルネットワークにフィードバックされる予測エラーの蓄積により、短いシーケンスが生成されることがよくあります。この問題は、ロングモーションシーケンスの生成ではさらに深刻になります。その上、スタイル、リズム、ビートの面でのダンスと音楽の一貫性は、モデリング中にまだ考慮されていません。この論文では、音楽駆動型ダンス生成をシーケンス間学習問題として形式化し、音楽機能の長いシーケンスを効率的に処理し、音楽とダンスの間のきめ細かい対応をキャプチャするための新しいseq2seqアーキテクチャを考案します。さらに、ロングモーションシーケンス生成における自己回帰モデルのエラー蓄積を軽減するための新しいカリキュラム学習戦略を提案します。これにより、トレーニングプロセスが、以前のグラウンドトゥルースの動きを使用した完全にガイドされた教師強制スキームから、ガイドの少ない自己回帰スキームに穏やかに変更されます。ほとんどの場合、代わりに生成された動きを使用します。広範な実験により、私たちのアプローチは、自動メトリックと人間による評価に関して、既存の最先端技術を大幅に上回っています。また、提案されたアプローチの優れたパフォーマンスを示すために、補足資料でデモビデオを作成します。

Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-driven dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video in the supplementary material to demonstrate the superior performance of our proposed approach.

updated: Sun Feb 07 2021 09:35:07 GMT+0000 (UTC)

published: Thu Jun 11 2020 00:08:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト