Semantify: Simplifying the Control of 3D Morphable Models using CLIP

Omer Gralnik; Guy Gafni; Ariel Shamir

Semantify: CLIP を使用した 3D モーファブルモデルの制御の簡素化

私たちは Semantify を紹介します。これは、CLIP 言語視覚基盤モデルのセマンティックな力を利用して 3D モーファブルモデルの制御を簡素化する自己教師ありメソッドです。パラメトリックモデルが与えられると、モデルのパラメーターをランダムにサンプリングし、さまざまな形状を作成してレンダリングすることによってトレーニングデータが作成されます。出力画像と一連の単語記述子の間の類似性は、CLIP の潜在空間で計算されます。私たちの重要なアイデアは、まず 3DMM を特徴付ける、意味的に意味があり、もつれが解けた記述子の小さなセットを選択し、次にこのセット全体のスコアから特定の 3DMM のパラメトリック係数への非線形マッピングを学習することです。非線形マッピングは、人間参加者なしでニューラルネットワークをトレーニングすることによって定義されます。身体形状モデル、顔形状と表情モデル、動物の形状など、多数の 3DMM の結果を紹介します。私たちのメソッドが直感的なモデリングのためのシンプルなスライダーインターフェイスをどのように定義するかを示し、マッピングを使用して 3D パラメトリック体型を実際の画像に瞬時に適合させる方法を示します。

We present Semantify: a self-supervised method that utilizes the semantic power of CLIP language-vision foundation model to simplify the control of 3D morphable models. Given a parametric model, training data is created by randomly sampling the model's parameters, creating various shapes and rendering them. The similarity between the output images and a set of word descriptors is calculated in CLIP's latent space. Our key idea is first to choose a small set of semantically meaningful and disentangled descriptors that characterize the 3DMM, and then learn a non-linear mapping from scores across this set to the parametric coefficients of the given 3DMM. The non-linear mapping is defined by training a neural network without a human-in-the-loop. We present results on numerous 3DMMs: body shape models, face shape and expression models, as well as animal shapes. We demonstrate how our method defines a simple slider interface for intuitive modeling, and show how the mapping can be used to instantly fit a 3D parametric body shape to in-the-wild images.

updated: Mon Aug 14 2023 19:07:26 GMT+0000 (UTC)

published: Mon Aug 14 2023 19:07:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト