F?D: On understanding the role of deep feature spaces on face generation evaluation

Krish Kabra; Guha Balakrishnan

F?D: 顔生成評価における深い特徴空間の役割の理解について

フレシェインセプションディスタンス (FID) などの知覚メトリクスは、合成的に生成された画像とグランドトゥルース (現実) 画像の間の類似性を評価するために広く使用されています。これらのメトリクスの背後にある重要なアイデアは、知覚的および意味的に豊富な画像特徴を捕捉する深い特徴空間で誤差を計算することです。それらの人気にもかかわらず、さまざまな深部機能とその設計の選択が知覚指標に及ぼす影響については十分に研究されていません。この研究では、いくつかの一般的な深部特徴空間を使用して、顔画像分布間の意味的属性の違いと歪みをフレシェ距離 (FD) に結び付ける因果分析を実行します。私たちの分析の重要な要素は、ディープフェイスジェネレーターを使用した合成の反事実的な顔を作成することです。私たちの実験は、FD がその特徴空間のトレーニングデータセットと目的関数によって大きく影響されることを示しています。たとえば、ImageNet でトレーニングされたモデルから抽出された特徴を使用する FD では、目や口などの領域よりも帽子が大幅に強調されます。さらに、顔の性別分類器からの特徴を使用する FD では、アイデンティティ (認識) 特徴空間内の距離よりも髪の長さが強調されます。最後に、特徴空間全体でいくつかの人気のある顔生成モデルを評価し、アイデンティティ (認識) 特徴を除いて、StyleGAN2 が他の顔生成モデルよりも一貫して上位にランクされていることを発見しました。これは、生成モデルを評価する際に複数の特徴空間を考慮し、対象領域のニュアンスに合わせて調整された特徴空間を使用する必要があることを示唆しています。

Perceptual metrics, like the Fréchet Inception Distance (FID), are widely used to assess the similarity between synthetically generated and ground truth (real) images. The key idea behind these metrics is to compute errors in a deep feature space that captures perceptually and semantically rich image features. Despite their popularity, the effect that different deep features and their design choices have on a perceptual metric has not been well studied. In this work, we perform a causal analysis linking differences in semantic attributes and distortions between face image distributions to Fréchet distances (FD) using several popular deep feature spaces. A key component of our analysis is the creation of synthetic counterfactual faces using deep face generators. Our experiments show that the FD is heavily influenced by its feature space's training dataset and objective function. For example, FD using features extracted from ImageNet-trained models heavily emphasize hats over regions like the eyes and mouth. Moreover, FD using features from a face gender classifier emphasize hair length more than distances in an identity (recognition) feature space. Finally, we evaluate several popular face generation models across feature spaces and find that StyleGAN2 consistently ranks higher than other face generators, except with respect to identity (recognition) features. This suggests the need for considering multiple feature spaces when evaluating generative models and using feature spaces that are tuned to nuances of the domain of interest.

updated: Fri Aug 11 2023 17:26:42 GMT+0000 (UTC)

published: Wed May 31 2023 17:21:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト