Assessing a Single Image in Reference-Guided Image Synthesis

Jiayi Guo; Chaoqun Du; Jiangshan Wang; Huijuan Huang; Pengfei Wan; Gao Huang

参照ガイド画像合成における単一画像の評価

Generative Adversarial Networks（GAN）のパフォーマンスの評価は、その実用的な重要性から重要なトピックとなっています。いくつかの評価指標が提案されていますが、それらは一般に、生成された画像分布全体の品質を評価します。参照ガイド画像合成（RIS）タスク、つまり、生成された単一の画像の品質を評価することが重要な別の参照画像のスタイルでソース画像をレンダリングする場合、これらのメトリックは適用されません。この論文では、単一の生成された画像の品質を定量的に評価するために、一般的な学習ベースのフレームワークである参照ガイド画像合成評価（RISA）を提案します。特に、RISAのトレーニングには人間による注釈は必要ありません。具体的には、RISAのトレーニングデータは、RISのトレーニング手順から中間モデルによって取得され、画質と反復の間の正の相関に基づいて、モデルの反復回数によって弱く注釈が付けられます。この注釈は監視信号として粗すぎるため、2つの手法を導入します。1）粗いラベルを改良するためのピクセル単位の補間スキームと、2）ナイーブリグレッサを置き換えるための複数のバイナリ分類子です。さらに、教師なし対照損失が導入され、生成された画像とその参照画像の間のスタイルの類似性を効果的にキャプチャします。さまざまなデータセットでの経験的結果は、RISAが人間の好みと非常に一致しており、モデル間でうまく転送されることを示しています。

Assessing the performance of Generative Adversarial Networks (GANs) has been an important topic due to its practical significance. Although several evaluation metrics have been proposed, they generally assess the quality of the whole generated image distribution. For Reference-guided Image Synthesis (RIS) tasks, i.e., rendering a source image in the style of another reference image, where assessing the quality of a single generated image is crucial, these metrics are not applicable. In this paper, we propose a general learning-based framework, Reference-guided Image Synthesis Assessment (RISA) to quantitatively evaluate the quality of a single generated image. Notably, the training of RISA does not require human annotations. In specific, the training data for RISA are acquired by the intermediate models from the training procedure in RIS, and weakly annotated by the number of models' iterations, based on the positive correlation between image quality and iterations. As this annotation is too coarse as a supervision signal, we introduce two techniques: 1) a pixel-wise interpolation scheme to refine the coarse labels, and 2) multiple binary classifiers to replace a naïve regressor. In addition, an unsupervised contrastive loss is introduced to effectively capture the style similarity between a generated image and its reference image. Empirical results on various datasets demonstrate that RISA is highly consistent with human preference and transfers well across models.

updated: Wed Dec 08 2021 08:22:14 GMT+0000 (UTC)

published: Wed Dec 08 2021 08:22:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト