SINC: Spatial Composition of 3D Human Motions for Simultaneous Action Generation

Nikos Athanasiou; Mathis Petrovich; Michael J. Black; Gül Varol

SINC: 同時アクション生成のための 3D ヒューマンモーションの空間構成

私たちの目標は、同時に「歩く」と同時に「手を振る」などの同時動作を記述するテキスト入力を与えて、3D 人間の動作を合成することです。このような同時の動きを生成することを、私たちは「空間構成」を実行することと呼びます。ある動作から別の動作に移行しようとする時間的な合成とは対照的に、空間的な合成では、身体のどの部分がどの動作に関与しているかを理解し、それらを同時に動かすことができる必要があります。アクションと体の部分の対応が強力な言語モデルにエンコードされているという観察に動機付けられ、「アクションに関与する体の部分は何か」のようなテキストを GPT-3 にプロンプトすることで、この知識を抽出します。 ?」を参照しながら、パーツリストといくつかのショットの例も提供します。このアクションとパーツのマッピングを考慮して、2 つのモーションの身体パーツを結合し、2 つのアクションを空間的に合成する最初の自動化された方法を確立します。ただし、合成アクションを含むトレーニングデータは、したがって、このアプローチで合成データをさらに作成し、それを使用して SINC (「3D 人間の動きのための同時動作コンポジション」と呼ばれる) と呼ばれる新しい最先端のテキストから動きへの生成モデルをトレーニングします。 ")。私たちの実験では、このような GPT ガイド付き合成データを使用したトレーニングにより、ベースラインよりも空間構成の生成が向上しました。私たちのコードは https://sinc.is.tue.mpg.de/ で公開されています。

Our goal is to synthesize 3D human motions given textual inputs describing simultaneous actions, for example 'waving hand' while 'walking' at the same time. We refer to generating such simultaneous movements as performing 'spatial compositions'. In contrast to temporal compositions that seek to transition from one action to another, spatial compositing requires understanding which body parts are involved in which action, to be able to move them simultaneously. Motivated by the observation that the correspondence between actions and body parts is encoded in powerful language models, we extract this knowledge by prompting GPT-3 with text such as "what are the body parts involved in the action ?", while also providing the parts list and few-shot examples. Given this action-part mapping, we combine body parts from two motions together and establish the first automated method to spatially compose two actions. However, training data with compositional actions is always limited by the combinatorics. Hence, we further create synthetic data with this approach, and use it to train a new state-of-the-art text-to-motion generation model, called SINC ("SImultaneous actioN Compositions for 3D human motions"). In our experiments, that training with such GPT-guided synthetic data improves spatial composition generation over baselines. Our code is publicly available at https://sinc.is.tue.mpg.de/.

updated: Sat Aug 19 2023 20:34:13 GMT+0000 (UTC)

published: Thu Apr 20 2023 16:01:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト