Generative Hierarchical Models for Parts, Objects, and Scenes

Fei Deng; Zhuo Zhi; Sungjin Ahn

パーツ、オブジェクト、およびシーンの生成階層モデル

パーツとオブジェクト間の合成構造は、自然のシーンに固有のものです。教師なし学習を介してこのような構成階層をモデリングすると、多くのダウンストリームタスクで重要な解釈可能性や転送可能性などのさまざまな利点がもたらされます。本論文では、解釈可能な組成階層の表現を学習するための、RICHと呼ばれる最初の深い潜在変数モデルを提案します。 RICHの核となるのは、シーンのエンティティを構成関係に従ってツリー構造に編成する潜在シーングラフ表現です。推論中、トップダウンアプローチを採用して、RICHは高レベルの表現を使用して低レベルの分解を導くことができます。これにより、ボトムアップアプローチが直面する部品とオブジェクト間のルーティングの困難な問題を回避できます。異なるパーツ構成を持つ複数のオブジェクトを含む画像の実験で、RICHが潜在的な構成階層を学習し、想像上のシーンを生成できることを実証します。

Compositional structures between parts and objects are inherent in natural scenes. Modeling such compositional hierarchies via unsupervised learning can bring various benefits such as interpretability and transferability, which are important in many downstream tasks. In this paper, we propose the first deep latent variable model, called RICH, for learning Representation of Interpretable Compositional Hierarchies. At the core of RICH is a latent scene graph representation that organizes the entities of a scene into a tree structure according to their compositional relationships. During inference, taking top-down approach, RICH is able to use higher-level representation to guide lower-level decomposition. This avoids the difficult problem of routing between parts and objects that is faced by bottom-up approaches. In experiments on images containing multiple objects with different part compositions, we demonstrate that RICH is able to learn the latent compositional hierarchy and generate imaginary scenes.

updated: Mon Oct 21 2019 02:28:16 GMT+0000 (UTC)

published: Mon Oct 21 2019 02:28:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト