LongDanceDiff: Long-term Dance Generation with Conditional Diffusion Model

Siqi Yang; Zejun Yang; Zhisheng Wang

LongDanceDiff: 条件付き拡散モデルによる長期ダンス生成

音楽に合わせて踊ることは、感情を表現するために常に人間にとって不可欠な芸術形式です。時間空間の複雑さが高いため、音楽と同期した 3D リアリズムダンスを長期にわたって生成するのは困難です。既存の方法では、エラーの蓄積とトレーニングと推論の不一致により、長期的なダンスを生成するときにフリーズする問題が発生します。これに対処するために、このシーケンス間の長期ダンス生成用の条件付き拡散モデル LongDanceDiff を設計し、時間的コヒーレンシーと空間的制約の課題に対処します。 LongDanceDiff にはトランスフォーマーベースの拡散モデルが含まれており、入力は音楽、過去のモーション、およびノイズのある未来のモーションの連結です。この部分的なノイズ戦略は、フルアテンションメカニズムを活用し、音楽と過去の動きの間の依存関係を学習します。生成されたダンスモーションの多様性を高め、フリーズ問題を軽減するために、過去と未来のモーション間の依存関係を正規化する相互情報最小化目標を導入します。また、グローバル軌道変調 (GTM) レイヤーとモーション知覚損失による空間制約を組み込むことで、足の滑りやスムーズでないモーションなど、ダンス生成における一般的な視覚品質の問題にも対処し、それによってモーション生成の滑らかさと自然さを向上させます。広範な実験により、既存の最先端の方法と比較して、当社のアプローチが大幅に改善されたことが実証されています。コードとモデルは近々リリースする予定です。

Dancing with music is always an essential human art form to express emotion. Due to the high temporal-spacial complexity, long-term 3D realist dance generation synchronized with music is challenging. Existing methods suffer from the freezing problem when generating long-term dances due to error accumulation and training-inference discrepancy. To address this, we design a conditional diffusion model, LongDanceDiff, for this sequence-to-sequence long-term dance generation, addressing the challenges of temporal coherency and spatial constraint. LongDanceDiff contains a transformer-based diffusion model, where the input is a concatenation of music, past motions, and noised future motions. This partial noising strategy leverages the full-attention mechanism and learns the dependencies among music and past motions. To enhance the diversity of generated dance motions and mitigate the freezing problem, we introduce a mutual information minimization objective that regularizes the dependency between past and future motions. We also address common visual quality issues in dance generation, such as foot sliding and unsmooth motion, by incorporating spatial constraints through a Global-Trajectory Modulation (GTM) layer and motion perceptual losses, thereby improving the smoothness and naturalness of motion generation. Extensive experiments demonstrate a significant improvement in our approach over the existing state-of-the-art methods. We plan to release our codes and models soon.

updated: Wed Aug 23 2023 06:37:41 GMT+0000 (UTC)

published: Wed Aug 23 2023 06:37:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト