On Aliased Resizing and Surprising Subtleties in GAN Evaluation

Gaurav Parmar; Richard Zhang; Jun-Yan Zhu

GAN評価におけるエイリアス化されたサイズ変更と驚くべき微妙さについて

生成モデルを評価するためのメトリックは、実際の画像と生成された画像の間の不一致を測定することを目的としています。たとえば、よく使用されるFrechet Inception Distance（FID）メトリックは、2つのセットから深いネットワークを使用して「高レベル」の特徴を抽出します。ただし、「低レベル」の前処理、特に画像のサイズ変更と圧縮の違いにより、大きな変動が生じ、予期しない結果が生じる可能性があることがわかりました。たとえば、バイリニアカーネルまたはバイキュービックカーネルを使用して画像のサイズを変更する場合、信号処理の原則により、ダウンサンプリング係数に応じてプレフィルターの幅を適切な帯域幅のアンチエイリアスに調整する必要があります。ただし、一般的に使用される実装では、固定幅のプレフィルターが使用されるため、エイリアシングアーティファクトが発生します。このようなエイリアシングは、ダウンストリームの特徴抽出で破損を引き起こします。次に、画像のファイルサイズを小さくするために、JPEGなどの非可逆圧縮が一般的に使用されます。画像の知覚品質の低下を最小限に抑えるように設計されていますが、この操作では下流の変動も発生します。さらに、実際のトレーニング画像に圧縮を使用する場合、生成された画像も後で圧縮すると、FIDが実際に向上する可能性があることを示します。この論文は、低レベルの画像処理における選択が生成モデリングの過小評価されている側面であることを示しています。生成モデリング開発パイプラインのバリエーションを特定して特徴づけ、信号処理の原則に基づいて推奨事項を提供し、将来の比較を容易にするためのリファレンス実装をリリースします。

Metrics for evaluating generative models aim to measure the discrepancy between real and generated images. The often-used Frechet Inception Distance (FID) metric, for example, extracts "high-level" features using a deep network from the two sets. However, we find that the differences in "low-level" preprocessing, specifically image resizing and compression, can induce large variations and have unforeseen consequences. For instance, when resizing an image, e.g., with a bilinear or bicubic kernel, signal processing principles mandate adjusting prefilter width depending on the downsampling factor, to antialias to the appropriate bandwidth. However, commonly-used implementations use a fixed-width prefilter, resulting in aliasing artifacts. Such aliasing leads to corruptions in the feature extraction downstream. Next, lossy compression, such as JPEG, is commonly used to reduce the file size of an image. Although designed to minimally degrade the perceptual quality of an image, the operation also produces variations downstream. Furthermore, we show that if compression is used on real training images, FID can actually improve if the generated images are also subsequently compressed. This paper shows that choices in low-level image processing have been an underappreciated aspect of generative modeling. We identify and characterize variations in generative modeling development pipelines, provide recommendations based on signal processing principles, and release a reference implementation to facilitate future comparisons.

updated: Thu Jan 20 2022 18:05:22 GMT+0000 (UTC)

published: Thu Apr 22 2021 17:58:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト