LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems

Björn Deiseroth; Patrick Schramowski; Hikaru Shindo; Devendra Singh Dhami; Kristian Kersting

LogicRank: 生成的なテキストから画像へのシステムのための論理誘導再ランキング

テキストから画像へのモデルは、最近、写真のようにリアルな品質の一見正確なサンプルで目覚ましい成功を収めました。ただし、最先端の言語モデルが正確なステートメントを一貫して評価するのに依然として苦労しているように、言語モデルに基づく画像生成プロセスも同様です。この作業では、DALL-E のような最先端のテキストから画像へのモデルの問題を紹介し、ドローベンチベンチマークに関連するステートメントから正確なサンプルを生成します。さらに、CLIP はこれらの生成されたサンプルを一貫して再ランク付けできないことを示します。この目的のために、このような精度が要求される設定に対してより正確なランキングシステムをもたらすことができるニューロシンボリック推論フレームワークである LogicRank を提案します。 LogicRank は、テキストから画像へのモデルの生成プロセスにスムーズに統合され、さらに、より論理的に正確なモデルに向けてさらに微調整するために使用できます。

Text-to-image models have recently achieved remarkable success with seemingly accurate samples in photo-realistic quality. However as state-of-the-art language models still struggle evaluating precise statements consistently, so do language model based image generation processes. In this work we showcase problems of state-of-the-art text-to-image models like DALL-E with generating accurate samples from statements related to the draw bench benchmark. Furthermore we show that CLIP is not able to rerank those generated samples consistently. To this end we propose LogicRank, a neuro-symbolic reasoning framework that can result in a more accurate ranking-system for such precision-demanding settings. LogicRank integrates smoothly into the generation process of text-to-image models and moreover can be used to further fine-tune towards a more logical precise model.

updated: Mon Aug 29 2022 11:40:36 GMT+0000 (UTC)

published: Mon Aug 29 2022 11:40:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト