Points-to-3D: Bridging the Gap between Sparse Points and Shape-Controllable Text-to-3D Generation

Chaohui Yu; Qiang Zhou; Jingliang Li; Zhe Zhang; Zhibin Wang; Fan Wang

Points-to-3D: まばらな点と形状制御可能な Text-to-3D 生成の間のギャップを埋める

テキストから 3D への生成は、数十億の画像とテキストのペアでトレーニングされた 2D 拡散モデルによって促進され、最近大きな注目を集めています。既存の方法は主にスコア蒸留に依存し、2D 拡散事前分布を活用して 3D モデル (NeRF など) の生成を監視します。ただし、スコア蒸留ではビューの不一致の問題が発生する傾向があり、暗黙的な NeRF モデリングでは任意の形状が生成される可能性があるため、現実性が低く制御不能な 3D 生成が発生します。この研究では、2D と 3D の両方の拡散モデルからの知識を抽出することによって、まばらだが自由に利用できる 3D ポイントと、現実的な形状制御可能な 3D 生成との間のギャップを埋める、Points-to-3D の柔軟なフレームワークを提案します。 Points-to-3D の中心となるアイデアは、制御可能なまばらな 3D ポイントを導入して、テキストから 3D への生成をガイドすることです。具体的には、3D 拡散モデル Point-E から生成された疎な点群を、単一の参照画像を条件とした幾何学的事前分布として使用します。まばらな 3D ポイントをより有効に活用するために、まばらな 3D ポイントの形状に合わせて NeRF のジオメトリを適応的に駆動する効率的な点群ガイダンス損失を提案します。ジオメトリの制御に加えて、より一貫性のある外観を実現するために NeRF を最適化することを提案します。具体的には、テキストと学習されたコンパクトなジオメトリの深度マップを条件として、公開されている 2D 画像拡散モデル ControlNet に対してスコア蒸留を実行します。定性的および定量的比較により、Points-to-3D によってビューの一貫性が向上し、テキストから 3D への生成において優れた形状制御性が実現されることが実証されました。 Points-to-3D は、テキストから 3D への生成を改善および制御する新しい方法をユーザーに提供します。

Text-to-3D generation has recently garnered significant attention, fueled by 2D diffusion models trained on billions of image-text pairs. Existing methods primarily rely on score distillation to leverage the 2D diffusion priors to supervise the generation of 3D models, e.g., NeRF. However, score distillation is prone to suffer the view inconsistency problem, and implicit NeRF modeling can also lead to an arbitrary shape, thus leading to less realistic and uncontrollable 3D generation. In this work, we propose a flexible framework of Points-to-3D to bridge the gap between sparse yet freely available 3D points and realistic shape-controllable 3D generation by distilling the knowledge from both 2D and 3D diffusion models. The core idea of Points-to-3D is to introduce controllable sparse 3D points to guide the text-to-3D generation. Specifically, we use the sparse point cloud generated from the 3D diffusion model, Point-E, as the geometric prior, conditioned on a single reference image. To better utilize the sparse 3D points, we propose an efficient point cloud guidance loss to adaptively drive the NeRF's geometry to align with the shape of the sparse 3D points. In addition to controlling the geometry, we propose to optimize the NeRF for a more view-consistent appearance. To be specific, we perform score distillation to the publicly available 2D image diffusion model ControlNet, conditioned on text as well as depth map of the learned compact geometry. Qualitative and quantitative comparisons demonstrate that Points-to-3D improves view consistency and achieves good shape controllability for text-to-3D generation. Points-to-3D provides users with a new way to improve and control text-to-3D generation.

updated: Wed Jul 26 2023 02:16:55 GMT+0000 (UTC)

published: Wed Jul 26 2023 02:16:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト