Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

Nan Liu; Yilun Du; Shuang Li; Joshua B. Tenenbaum; Antonio Torralba

テキストから画像への生成モデルによる教師なし構成概念の発見

テキストから画像への生成モデルにより、さまざまなドメインにわたる高解像度の画像合成が可能になりましたが、ユーザーは生成したいコンテンツを指定する必要があります。この論文では、逆問題、つまり異なる画像のコレクションが与えられた場合、各画像を表す生成概念を発見できるかについて考えます。私たちは、画像のコレクションから生成概念を発見し、絵画、物体、キッチンのシーンからの照明におけるさまざまなアートスタイルを解きほぐし、ImageNet 画像から与えられた画像クラスを発見するための教師なしアプローチを紹介します。このような生成概念がどのようにして画像の内容を正確に表現し、再結合および合成して新しい芸術的かつハイブリッドな画像を生成し、さらに下流の分類タスクの表現として使用できるかを示します。

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of different images, can we discover the generative concepts that represent each image? We present an unsupervised approach to discover generative concepts from a collection of images, disentangling different art styles in paintings, objects, and lighting from kitchen scenes, and discovering image classes given ImageNet images. We show how such generative concepts can accurately represent the content of images, be recombined and composed to generate new artistic and hybrid images, and be further used as a representation for downstream classification tasks.

updated: Thu Aug 03 2023 17:07:41 GMT+0000 (UTC)

published: Thu Jun 08 2023 17:02:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト