On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation

Gaurav Parmar; Richard Zhang; Jun-Yan Zhu

FID計算におけるバギーサイズ変更ライブラリと驚くべき微妙さについて

フレシェ開始距離（FID）スコアの、さまざまな画像処理ライブラリ間での一貫性のない、多くの場合誤った実装に対する感度を調査します。 FIDスコアは生成モデルの評価に広く使用されていますが、各FID実装は異なる低レベルの画像処理プロセスを使用します。一般的に使用される深層学習ライブラリの画像サイズ変更機能は、エイリアシングアーティファクトを導入することがよくあります。 FIDの計算には多くの微妙な選択を行う必要があり、これらの選択に一貫性がないと、FIDスコアが大きく異なる可能性があることがわかります。特に、次の選択が重要であることを示します。（1）使用する画像サイズ変更ライブラリの選択、（2）使用する補間カーネルの選択、（3）画像を表現するときに使用するエンコード。さらに、回避する必要のある多くの一般的な落とし穴について概説し、FIDスコアを正確に計算するための推奨事項を提供します。添付のコードで、提案された推奨事項の使いやすい最適化された実装を提供します。

We investigate the sensitivity of the Fréchet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonly-used deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies in these choices can lead to vastly different FID scores. In particular, we show that the following choices are significant: (1) selecting what image resizing library to use, (2) choosing what interpolation kernel to use, (3) what encoding to use when representing images. We additionally outline numerous common pitfalls that should be avoided and provide recommendations for computing the FID score accurately. We provide an easy-to-use optimized implementation of our proposed recommendations in the accompanying code.

updated: Thu Apr 22 2021 17:58:38 GMT+0000 (UTC)

published: Thu Apr 22 2021 17:58:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト