Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset

Zhixuan Liu; Youeun Shin; Beverley-Claire Okogwu; Youngsik Yun; Lia Coleman; Peter Schaldenbrand; Jihie Kim; Jean Oh

異文化理解ベンチマーク (CCUB) データセットを使用したテキストから画像への合成モデルにおける公平な表現に向けて

メディアでの正確な表現は、それを消費する人々の幸福を改善することが示されています.対照的に、不正確な表現は視聴者に悪影響を及ぼし、他の文化に対する有害な認識につながる可能性があります。生成された画像で包括的な表現を実現するために、文化を意識したプライミングアプローチを提案します。これは、ここでは Cross-Cultural Understanding Benchmark (CCUB) Dataset と呼ばれる、小さいながらも文化的に精選されたデータセットを使用して、テキストから画像への合成に使用されます。巨大なデータセットに蔓延するバイアス。私たちが提案するアプローチは、2 つの微調整手法で構成されています。(1) 事前にトレーニングされたテキストから画像への合成モデルである Stable Diffusion を CCUB のテキストと画像のペアで微調整することにより、視覚的なコンテキストを追加します。 CCUB の文化を意識したテキストデータでトレーニングされた、微調整された大規模言語モデル GPT-3 を使用した自動化されたプロンプトエンジニアリングによるセマンティックコンテキスト。 CCUB データセットは厳選されており、私たちのアプローチはその特定の文化と個人的な関係を持つ人々によって評価されます。私たちの実験は、テキストと画像の両方を使用したプライミングが、品質を維持しながら文化的関連性を改善し、生成された画像の攻撃性を減らすのに効果的であることを示しています。

It has been shown that accurate representation in media improves the well-being of the people who consume it. By contrast, inaccurate representations can negatively affect viewers and lead to harmful perceptions of other cultures. To achieve inclusive representation in generated images, we propose a culturally-aware priming approach for text-to-image synthesis using a small but culturally curated dataset that we collected, known here as Cross-Cultural Understanding Benchmark (CCUB) Dataset, to fight the bias prevalent in giant datasets. Our proposed approach is comprised of two fine-tuning techniques: (1) Adding visual context via fine-tuning a pre-trained text-to-image synthesis model, Stable Diffusion, on the CCUB text-image pairs, and (2) Adding semantic context via automated prompt engineering using the fine-tuned large language model, GPT-3, trained on our CCUB culturally-aware text data. CCUB dataset is curated and our approach is evaluated by people who have a personal relationship with that particular culture. Our experiments indicate that priming using both text and image is effective in improving the cultural relevance and decreasing the offensiveness of generated images while maintaining quality.

updated: Wed Apr 26 2023 15:41:05 GMT+0000 (UTC)

published: Sat Jan 28 2023 03:10:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト