MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang; Zhongang Cai; Liang Pan; Fangzhou Hong; Xinying Guo; Lei Yang; Ziwei Liu

MotionDiffuse: 拡散モデルを使用したテキスト駆動型の人間のモーション生成

ヒューマンモーションモデリングは、通常、専門的なスキルを必要とする多くの最新のグラフィックスアプリケーションにとって重要です。素人のスキルの壁を取り除くために、最近のモーション生成手法は、自然言語を条件とした人間のモーションを直接生成することができます。ただし、さまざまなテキスト入力で多様できめ細かいモーション生成を実現することは依然として困難です。この問題に対処するために、最初の拡散モデルベースのテキスト駆動型モーション生成フレームワークである MotionDiffuse を提案します。これは、既存の方法に対していくつかの望ましい特性を示します。 1) 確率マッピング。決定論的な言語モーションマッピングの代わりに、MotionDiffuse はバリエーションが挿入される一連のノイズ除去ステップを通じてモーションを生成します。 2) 現実的な合成。 MotionDiffuse は、複雑なデータ分布のモデリングと鮮やかなモーションシーケンスの生成に優れています。 3) マルチレベル操作。 MotionDiffuse は、ボディパーツのきめ細かい指示と、時間変化するテキストプロンプトによる任意の長さのモーション合成に応答します。私たちの実験では、MotionDiffuse が既存の SoTA メソッドよりも優れており、テキスト駆動型のモーション生成とアクション条件付きのモーション生成のマージンを納得させることが示されています。定性分析は、包括的なモーション生成に対する MotionDiffuse の制御性をさらに実証します。ホームページ: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation. Homepage: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

updated: Wed Aug 31 2022 17:58:54 GMT+0000 (UTC)

published: Wed Aug 31 2022 17:58:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト