ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models

Maitreya Patel; Tejas Gokhale; Chitta Baral; Yezhou Yang

ConceptBed: テキストから画像への拡散モデルの概念学習能力の評価

視覚的な概念を理解し、画像からこれらの概念を複製して構成する能力は、コンピュータービジョンの中心的な目標です。 Text-to-Image (T2I) モデルの最近の進歩により、画像とその説明の大規模なデータベースから学習することで、高解像度でリアルな画質が生成されます。ただし、T2I モデルの評価はフォトリアリズムに焦点を当てており、視覚的な理解の定性的な尺度は限られています。新しいビジュアルコンセプトの学習と合成における T2I モデルの能力を定量化するために、284 のユニークなビジュアルコンセプト、5K のユニークなコンセプト構成、および 33K の複合テキストプロンプトで構成される大規模なデータセットである ConceptBed を導入します。データセットとともに、評価指標であるコンセプト信頼偏差 (CCD) を提案します。これは、オラクルのコンセプト分類器の信頼度を使用して、T2I ジェネレーターによって生成されたコンセプトとグランドトゥルース画像に含まれるコンセプトの間の整合性を測定します。私たちは、オブジェクト、属性、またはスタイルのいずれかの視覚的概念を評価し、また、カウント、属性、関係、およびアクションという構成性の 4 つの側面も評価します。私たちの人間を対象とした研究では、CCD が人間の概念理解と高度に相関していることが示されています。私たちの結果は、概念の学習と、既存のアプローチでは克服するのに苦労している構成性の維持との間のトレードオフを示しています。

The ability to understand visual concepts and replicate and compose these concepts from images is a central goal for computer vision. Recent advances in text-to-image (T2I) models have lead to high definition and realistic image quality generation by learning from large databases of images and their descriptions. However, the evaluation of T2I models has focused on photorealism and limited qualitative measures of visual understanding. To quantify the ability of T2I models in learning and synthesizing novel visual concepts, we introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts, 5K unique concept compositions, and 33K composite text prompts. Along with the dataset, we propose an evaluation metric, Concept Confidence Deviation (CCD), that uses the confidence of oracle concept classifiers to measure the alignment between concepts generated by T2I generators and concepts contained in ground truth images. We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions. Our human study shows that CCD is highly correlated with human understanding of concepts. Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.

updated: Wed Jun 07 2023 18:00:38 GMT+0000 (UTC)

published: Wed Jun 07 2023 18:00:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト