Detecting and Preventing Hallucinations in Large Vision Language Models

Anisha Gunjal; Jihan Yin; Erhan Bas

ラージビジョン言語モデルにおける幻覚の検出と防止

命令調整された Large Vision Language Model (LVLM) は、特に Visual Question Answering (VQA) において、マルチモーダルタスクの多様なセット全体の一般化において大幅に進歩しました。ただし、視覚的に根拠のある詳細な応答を生成することは、これらのモデルにとって依然として困難な作業です。現在の最先端の LVLM (InstructBLIP) にも、存在しない物体、不誠実な説明、不正確な関係の形で幻覚テキストの 30% という驚異的な内容が依然として含まれていることがわかりました。これに対処するために、幻覚の検出と防止のためのモデルのトレーニングとベンチマークに使用できる、(M)究極の (Hal)lucination (Detect)ion Dataset である M-HalDetect を導入します。 M-HalDetect は、VQA サンプルに関する 16,000 のきめの細かいアノテーションで構成されており、詳細な画像説明のための最初の包括的なマルチモーダル幻覚検出データセットとなっています。物体の幻覚のみを考慮した以前の研究とは異なり、実体の説明と不誠実な関係の両方にさらに注釈を付けます。幻覚防止に対するこのデータセットの可能性を実証するために、新しい Fine-graned Direct Preference Optimization (FDPO) を通じて InstructBLIP を最適化します。また、InstructBLIP からきめ細かいマルチモーダル報酬モデルをトレーニングし、best-of-n 拒否サンプリングでその有効性を評価します。 FDPO と拒絶サンプリングの両方について人による評価を実行したところ、InstructBLIP での幻覚率がそれぞれ 41% と 55% 減少することがわかりました。また、私たちの報酬モデルは他のマルチモーダルモデルに一般化され、LLaVAとmPLUG-OWLの幻覚をそれぞれ15％と57％減少させ、人間が評価した精度スコアと強い相関があることもわかりました。

Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a (M)ultimodal (Hal)lucination (Detect)ion Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect consists of 16k fine-grained annotations on VQA examples, making it the first comprehensive multi-modal hallucination detection dataset for detailed image descriptions. Unlike previous work that only consider object hallucination, we additionally annotate both entity descriptions and relationships that are unfaithful. To demonstrate the potential of this dataset for hallucination prevention, we optimize InstructBLIP through our novel Fine-grained Direct Preference Optimization (FDPO). We also train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling. We perform human evaluation on both FDPO and rejection sampling, and find that they reduce hallucination rates in InstructBLIP by 41% and 55% respectively. We also find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively, and has strong correlation with human evaluated accuracy scores.

updated: Sun Feb 11 2024 08:38:07 GMT+0000 (UTC)

published: Fri Aug 11 2023 21:35:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト