Automatic Generation of Semantic Parts for Face Image Synthesis

Tomaso Fontanini; Claudio Ferrari; Massimo Bertozzi; Andrea Prati

顔画像合成のための意味部分の自動生成

セマンティックイメージ合成 (SIS) は、オブジェクトクラスの空間レイアウトを定義するセマンティックセグメンテーションマスクを指定してリアルな画像を生成する問題を指します。生成された画像の品質以外の文献に記載されているアプローチのほとんどは、スタイル、つまりテクスチャの観点から生成の多様性を高めるための解決策を見つけることに重点を置いています。ただし、それらはすべて、マスクによって提供されるレイアウトを操作できるという別の機能を無視しています。現時点では、これを行う唯一の方法は、グラフィカルユーザーインターフェイスを使用して手動で行うことです。この論文では、人間の顔に特に焦点を当て、セマンティックセグメンテーションマスク内のオブジェクトクラスの形状を自動的に操作または生成する問題に対処するネットワークアーキテクチャについて説明します。私たちが提案したモデルでは、マスクをクラスごとに潜在空間に埋め込むことができ、各クラスの埋め込みを個別に編集できます。次に、双方向 LSTM ブロックと畳み込みデコーダーが、ローカルで操作された新しいマスクを出力します。 CelebMask-HQ データセットに関する定量的および定性的な結果を報告します。これは、モデルがクラスレベルでセグメンテーションマスクを忠実に再構築および変更できることを示しています。また、モデルを SIS ジェネレーターの前に配置して、形状とテクスチャの両方の完全自動生成制御への道を開くことも示します。コードは https://github.com/TFonta/Semantic-VAE で入手できます。

Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.

updated: Tue Jul 11 2023 15:01:42 GMT+0000 (UTC)

published: Tue Jul 11 2023 15:01:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト