SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Nikos Athanasiou; Mathis Petrovich; Michael J. Black; Gül Varol

SINC: 同時アクション生成のための 3D 人間モーションの空間合成

私たちの目標は、同時に「歩く」と同時に「手を振る」など、同時アクションを説明するテキスト入力が与えられた場合に、人間の 3D モーションを合成することです。このような同時動作を生成することを「空間構成」と呼びます。あるアクションから別のアクションに移行しようとする時間的な合成とは対照的に、空間的な合成では、どの身体部分がどのアクションに関与しているかを理解し、それらを同時に動かすことができる必要があります。行動と体の部分の間の対応が強力な言語モデルにエンコードされているという観察に動機付けられて、「行動に関与する体の部分は何ですか」などのテキストでGPT-3を促すことで、この知識を抽出します?」、パーツリストといくつかのショットの例も提供します。このアクションパーツマッピングが与えられた場合、2 つのモーションの身体パーツを組み合わせて、2 つのアクションを空間的に構成する最初の自動化された方法を確立します。ただし、構成アクションを含むトレーニングデータはしたがって、このアプローチでさらに合成データを作成し、それを使用して、SINC (「3D 人間の動きのための同時アクション合成」と呼ばれる新しい最先端のテキストからモーションへの生成モデルをトレーニングします。 "). 私たちの実験では、追加の合成 GPT ガイド付き構成モーションのトレーニングにより、テキストからモーションへの生成が改善されることがわかりました。

Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action ?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, we find training on additional synthetic GPT-guided compositional motions improves text-to-motion generation.

updated: Thu Apr 20 2023 16:01:55 GMT+0000 (UTC)

published: Thu Apr 20 2023 16:01:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト