ZeroForge: Feedforward Text-to-Shape Without 3D Supervision

Kelly O. Marshall; Minh Pham; Ameya Joshi; Anushrut Jignasu; Aditya Balu; Adarsh Krishnamurthy; Chinmay Hegde

ZeroForge: 3D 監視なしのフィードフォワードテキストからシェイプへ

テキストから形状への生成のための現在の最先端の方法では、事前定義された 3D 形状のラベル付きデータセットを使用した教師ありトレーニングが必要か、暗黙的なニューラル表現の高価な推論時間の最適化を実行する必要があります。この研究では、両方の落とし穴を回避する、ゼロショットテキストからシェイプへの生成のアプローチである ZeroForge を紹介します。オープン語彙形状生成を実現するには、既存のフィードフォワードアプローチを慎重にアーキテクチャに適応させるだけでなく、モード崩壊を避けるためにデータフリーの CLIP 損失と対照的損失を組み合わせる必要があります。これらの技術を使用すると、CLIP-Forge などの既存のフィードフォワードテキストからシェイプへのモデルの生成能力を大幅に拡張できます。私たちは広範な定性的および定量的評価を通じて私たちの方法をサポートします

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations

updated: Fri Jun 16 2023 00:48:13 GMT+0000 (UTC)

published: Wed Jun 14 2023 00:38:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト