Synthesizing Long-Term Human Motions with Diffusion Models via Coherent Sampling

Zhao Yang; Bing Su; Ji-Rong Wen

コヒーレントサンプリングによる拡散モデルによる長期的な人間の動きの合成

テキストからモーションへの生成はますます注目を集めていますが、既存のほとんどの方法は、単一のアクションを説明する単一の文に対応する短期間のモーションの生成に限定されています。ただし、テキストストリームが一連の連続したモーションを記述している場合、各文に対応する生成されたモーションが一貫してリンクされていない可能性があります。既存の長期モーション生成方法は、2 つの主要な問題に直面しています。まず、一貫したモーションを直接生成することができず、生成されたアクションを処理するには補間などの追加の操作が必要です。第二に、以前のアクションに対する将来のアクションの影響を考慮せずに、自己回帰的な方法で後続のアクションを生成します。これらの問題に対処するために、我々は、過去修復サンプリングと組成遷移サンプリングという 2 つのオプションのコヒーレントサンプリング方法を使用して、過去条件付き拡散モデルを利用する新しいアプローチを提案します。過去の修復サンプリングは、前のモーションを条件として扱うことで後続のモーションを完成させますが、構成トランジションサンプリングは、異なるテキストプロンプトによってガイドされる 2 つの隣接するモーションの合成としてトランジションの分布をモデル化します。私たちの実験結果は、私たちが提案した方法が、ユーザーが指示した長いテキストストリームによって制御される、構成的で一貫した長期的な3D人間の動きを生成できることを示しています。コードは https://github.com/yangzhao1230/PCMDMhttps://github.com/yangzhao1230/PCMDM で入手できます。

Text-to-motion generation has gained increasing attention, but most existing methods are limited to generating short-term motions that correspond to a single sentence describing a single action. However, when a text stream describes a sequence of continuous motions, the generated motions corresponding to each sentence may not be coherently linked. Existing long-term motion generation methods face two main issues. Firstly, they cannot directly generate coherent motions and require additional operations such as interpolation to process the generated actions. Secondly, they generate subsequent actions in an autoregressive manner without considering the influence of future actions on previous ones. To address these issues, we propose a novel approach that utilizes a past-conditioned diffusion model with two optional coherent sampling methods: Past Inpainting Sampling and Compositional Transition Sampling. Past Inpainting Sampling completes subsequent motions by treating previous motions as conditions, while Compositional Transition Sampling models the distribution of the transition as the composition of two adjacent motions guided by different text prompts. Our experimental results demonstrate that our proposed method is capable of generating compositional and coherent long-term 3D human motions controlled by a user-instructed long text stream. The code is available at https://github.com/yangzhao1230/PCMDMhttps://github.com/yangzhao1230/PCMDM.

updated: Thu Aug 03 2023 16:18:32 GMT+0000 (UTC)

published: Thu Aug 03 2023 16:18:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト