Spatially Multi-conditional Image Generation

Ritika Chakraborty; Nikola Popovic; Danda Pani Paudel; Thomas Probst; Luc Van Gool

空間的に多条件の画像生成

ほとんどのシナリオでは、条件付き画像の生成は、画像理解プロセスの反転と考えることができます。一般的な画像の理解には複数のタスクの解決が含まれるため、マルチコンディショニングを介して画像を生成することを目指すのは自然なことです。ただし、多条件画像の生成は、（実際には）利用可能なコンディショニングラベルの不均一性と希薄性のため、非常に困難な問題です。この作業では、空間的に多条件のラベルの不均一性とスパース性の問題に対処するための新しいニューラルアーキテクチャを提案します。セマンティクスや深度などによる空間条件付けの選択は、画像生成プロセスをより適切に制御するための約束に基づいています。提案された方法は、ピクセル単位で動作する変圧器のようなアーキテクチャを使用します。これは、利用可能なラベルを入力トークンとして受け取り、学習された同種のラベル空間にそれらをマージします。マージされたラベルは、条件付き生成的敵対的生成ネットワークを介した画像生成に使用されます。このプロセスでは、提案されたピクセル単位の操作アーキテクチャのおかげで、ラベルの希薄性は、欠落しているラベルに対応する入力トークンを目的の場所にドロップするだけで処理されます。 3つのベンチマークデータセットでの実験は、最先端の比較されたベースラインに対する私たちの方法の明らかな優位性を示しています。ソースコードは公開されます。

In most scenarios, conditional image generation can be thought of as an inversion of the image understanding process. Since generic image understanding involves solving multiple tasks, it is natural to aim at generating images via multi-conditioning. However, multi-conditional image generation is a very challenging problem due to the heterogeneity and the sparsity of the (in practice) available conditioning labels. In this work, we propose a novel neural architecture to address the problem of heterogeneity and sparsity of the spatially multi-conditional labels. Our choice of spatial conditioning, such as by semantics and depth, is driven by the promise it holds for better control of the image generation process. The proposed method uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens to merge them in a learned homogeneous space of labels. The merged labels are then used for image generation via conditional generative adversarial training. In this process, the sparsity of the labels is handled by simply dropping the input tokens corresponding to the missing labels at the desired locations, thanks to the proposed pixel-wise operating architecture. Our experiments on three benchmark datasets demonstrate the clear superiority of our method over the state-of-the-art and compared baselines. The source code will be made publicly available.

updated: Thu Jul 14 2022 09:54:52 GMT+0000 (UTC)

published: Fri Mar 25 2022 17:57:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト