Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

Xiaoshi Wu; Yiming Hao; Keqiang Sun; Yixiong Chen; Feng Zhu; Rui Zhao; Hongsheng Li

Human Preference Score v2: テキストから画像への合成における人間の好みを評価するための確実なベンチマーク

最近のテキストから画像への生成モデルは、テキスト入力から忠実度の高い画像を生成できますが、生成された画像の品質は既存の評価基準では正確に評価できません。この問題に対処するために、広範囲のソースから画像に対する人間の好みをキャプチャする大規模なデータセットである Human Preference Dataset v2 (HPD v2) を導入します。 HPD v2 は、430,060 組の画像上に 798,090 個の人間の好みの選択肢で構成されており、この種のデータセットとしては最大です。テキストプロンプトと画像は、以前のデータセットでよくある問題である潜在的なバイアスを排除するために意図的に収集されています。 HPD v2 で CLIP を微調整することで、テキスト生成画像の人間の好みをより正確に予測できるスコアリングモデルである Human Preference Score v2 (HPS v2) を取得します。私たちの実験では、HPS v2 がさまざまな画像分布にわたって以前のメトリクスよりも優れた一般化を実現し、テキストから画像への生成モデルのアルゴリズムの改善に応答するため、これらのモデルにとって好ましい評価メトリクスとなることが実証されました。また、評価を安定して公平かつ使いやすくするために、テキストから画像への生成モデルの評価プロンプトの設計も調査します。最後に、HPS v2 を使用してテキストから画像への生成モデルのベンチマークを確立します。これには、学術界、コミュニティ、業界からの最近のテキストから画像へのモデルのセットが含まれます。コードとデータセットは https://github.com/tgxs002/HPSv2 で入手可能です。

Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 430,060 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in previous datasets. By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict text-generated images' human preferences. Our experiments demonstrate that HPS v2 generalizes better than previous metrics across various image distributions and is responsive to algorithmic improvements of text-to-image generative models, making it a preferable evaluation metric for these models. We also investigate the design of the evaluation prompts for text-to-image generative models, to make the evaluation stable, fair and easy-to-use. Finally, we establish a benchmark for text-to-image generative models using HPS v2, which includes a set of recent text-to-image models from the academia, community and industry. The code and dataset is / will be available at https://github.com/tgxs002/HPSv2.

updated: Thu Jun 15 2023 17:59:31 GMT+0000 (UTC)

published: Thu Jun 15 2023 17:59:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト