Structured Graph Variational Autoencoders for Indoor Furniture layout Generation

Aditya Chattopadhyay; Xi Zhang; David Paul Wipf; Himanshu Arora; Rene Vidal

屋内家具レイアウト生成のための構造化グラフ変分オートエンコーダ

屋内3Dシーンのレイアウトを生成するための構造化グラフ変分オートエンコーダーを紹介します。部屋のタイプ（たとえば、リビングルームやライブラリ）と部屋のレイアウト（たとえば、床や壁などの部屋の要素）を考えると、私たちのアーキテクチャは、オブジェクトのコレクション（たとえば、ソファ、テーブル、椅子などの家具アイテム）を生成します。部屋のタイプとレイアウトと一致しています。生成されたシーンは複数の制約を満たす必要があるため、これは難しい問題です。たとえば、各オブジェクトは部屋の中にある必要があり、2つのオブジェクトが同じボリュームを占めることはできません。これらの課題に対処するために、属性付きグラフのソフト制約としてこれらの関係をエンコードする深い生成モデルを提案します（たとえば、ノードはクラス、ポーズ、サイズなどの部屋や家具の要素の属性をキャプチャし、エッジは次のような幾何学的関係をキャプチャします相対的な向きとして）。このアーキテクチャは、入力グラフを構造化された潜在空間にマッピングするグラフエンコーダーと、潜在コードと部屋のグラフを指定して家具グラフを生成するグラフデコーダーで構成されます。潜在空間は、高度に構造化されたシーンの生成を容易にする自己回帰事前確率でモデル化されます。また、マッチングと制約付き学習を組み合わせた効率的なトレーニング手順を提案します。 3D-FRONTデータセットでの実験は、私たちの方法が多様で部屋のレイアウトに適合したシーンを生成することを示しています。

We present a structured graph variational autoencoder for generating the layout of indoor 3D scenes. Given the room type (e.g., living room or library) and the room layout (e.g., room elements such as floor and walls), our architecture generates a collection of objects (e.g., furniture items such as sofa, table and chairs) that is consistent with the room type and layout. This is a challenging problem because the generated scene should satisfy multiple constrains, e.g., each object must lie inside the room and two objects cannot occupy the same volume. To address these challenges, we propose a deep generative model that encodes these relationships as soft constraints on an attributed graph (e.g., the nodes capture attributes of room and furniture elements, such as class, pose and size, and the edges capture geometric relationships such as relative orientation). The architecture consists of a graph encoder that maps the input graph to a structured latent space, and a graph decoder that generates a furniture graph, given a latent code and the room graph. The latent space is modeled with auto-regressive priors, which facilitates the generation of highly structured scenes. We also propose an efficient training procedure that combines matching and constrained learning. Experiments on the 3D-FRONT dataset show that our method produces scenes that are diverse and are adapted to the room layout.

updated: Fri Jul 22 2022 05:56:40 GMT+0000 (UTC)

published: Mon Apr 11 2022 04:58:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト