NCHO: Unsupervised Learning for Neural 3D Composition of Humans and Objects

Taeksoo Kim; Shunsuke Saito; Hanbyul Joo

NCHO: 人間と物体の神経 3D 構成のための教師なし学習

深層生成モデルは最近、3D デジタルヒューマンの合成まで拡張されました。ただし、これまでのアプローチでは、衣服やアクセサリーの構成を考慮せずに、衣服を着た人間を単一のジオメトリの塊として扱いました。その結果、個々のアイテムを自然に新しいアイデンティティに構成することができず、生成される 3D アバターの表現力と制御性が制限されてしまいます。合成データを活用してこの問題に対処しようとする方法がいくつかありますが、人間とオブジェクト間の対話はドメインのギャップにより本物ではなく、手動によるアセット作成ではさまざまなオブジェクトに対応するのが困難です。この研究では、実世界の 3D スキャンから人間と物体 (バックパック、コート、スカーフなど) の構成生成モデルを学習するための新しいフレームワークを紹介します。私たちの構成モデルは、人間と物体の間の空間関係を意味するインタラクションを意識しており、物理的接触による相互の形状変化が完全に組み込まれています。重要な課題は、人間と物体が接触しているため、その 3D スキャンが 1 つの部分に統合されることです。手動の注釈を付けずにそれらを分解するには、物体のある人物と物体のない人物の 2 セットの 3D スキャンを活用することを提案します。私たちのアプローチは、オブジェクトを分解し、それらを教師なしの方法で生成的な人間モデルに自然に再構成することを学習します。単一の被写体とオブジェクトをキャプチャするだけのシンプルな設定にもかかわらず、実験では、トレーニングデータでは見られない、さまざまなポーズでの多様なアイデンティティに対するオブジェクトの自然な構成や、複数のオブジェクトの構成を可能にすることで、モデルの強力な一般化を実証しています。。 https://taeksuu.github.io/ncho/

Deep generative models have been recently extended to synthesizing 3D digital humans. However, previous approaches treat clothed humans as a single chunk of geometry without considering the compositionality of clothing and accessories. As a result, individual items cannot be naturally composed into novel identities, leading to limited expressiveness and controllability of generative 3D avatars. While several methods attempt to address this by leveraging synthetic data, the interaction between humans and objects is not authentic due to the domain gap, and manual asset creation is difficult to scale for a wide variety of objects. In this work, we present a novel framework for learning a compositional generative model of humans and objects (backpacks, coats, scarves, and more) from real-world 3D scans. Our compositional model is interaction-aware, meaning the spatial relationship between humans and objects, and the mutual shape change by physical contact is fully incorporated. The key challenge is that, since humans and objects are in contact, their 3D scans are merged into a single piece. To decompose them without manual annotations, we propose to leverage two sets of 3D scans of a single person with and without objects. Our approach learns to decompose objects and naturally compose them back into a generative human model in an unsupervised manner. Despite our simple setup requiring only the capture of a single subject with objects, our experiments demonstrate the strong generalization of our model by enabling the natural composition of objects to diverse identities in various poses and the composition of multiple objects, which is unseen in training data. https://taeksuu.github.io/ncho/

updated: Mon May 29 2023 13:51:25 GMT+0000 (UTC)

published: Tue May 23 2023 17:59:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト