On the Cultural Gap in Text-to-Image Generation

Bingshuai Liu; Longyue Wang; Chenyang Lyu; Yong Zhang; Jinsong Su; Shuming Shi; Zhaopeng Tu

テキストから画像への生成における文化的なギャップについて

Text-to-Image (T2I) 生成における課題の 1 つは、トレーニングデータに存在する文化ギャップが不用意に反映されてしまうことです。これは、入力テキストの文化的要素がトレーニングセットにほとんど収集されない場合に、生成される画像品質の差異を意味します。さまざまな T2I モデルは、印象的ではあるが恣意的な例を示していますが、異文化イメージを生成する T2I モデルの能力を体系的に評価するベンチマークはありません。このギャップを埋めるために、モデルがターゲット文化にどの程度適合しているかを評価できる、包括的な評価基準を備えた挑戦的な異文化間 (C3) ベンチマークを提案します。 C3 ベンチマークの安定拡散モデルによって生成された欠陥のある画像を分析すると、モデルが特定の文化的対象物の生成に失敗することが多いことがわかりました。したがって、我々は、オブジェクトとテキストの位置合わせを考慮してターゲット文化の微調整データをフィルタリングする新しいマルチモーダルメトリクスを提案します。これは、異文化生成を改善するために T2I モデルを微調整するために使用されます。実験結果は、オブジェクトとテキストの位置合わせが重要である既存のメトリクスよりも、C3 ベンチマークでマルチモーダルメトリクスが強力なデータ選択パフォーマンスを提供することを示しています。文化的に多様な T2I 生成に関する将来の研究を促進するために、ベンチマーク、データ、コード、生成されたイメージをリリースします (https://github.com/longyuewangdcu/C3-Bench)。

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model's ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation (https://github.com/longyuewangdcu/C3-Bench).

updated: Thu Jul 06 2023 13:17:55 GMT+0000 (UTC)

published: Thu Jul 06 2023 13:17:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト