Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

Wenliang Dai; Zihan Liu; Ziwei Ji; Dan Su; Pascale Fung

もっともらしいことは忠実ではないかもしれません: 視覚言語の事前訓練における対象の幻覚の調査

大規模な視覚言語事前訓練済み (VLP) モデルは、視覚情報に基づいてテキストを生成するときに、存在しない視覚オブジェクトを幻覚させる傾向があります。この論文では、対象の幻覚の問題を 3 つの側面から徹底的に調査します。まず、さまざまな最先端の VLP モデルを調べて、標準的な指標 (BLEU-4、CIDEr など) でより高いスコアを達成するモデルは、オブジェクトをより頻繁に幻覚させる可能性があることを示します。次に、領域ベース、グリッドベース、パッチベースなど、VLP のさまざまな種類の視覚的特徴が幻覚にどのように影響するかを調査します。驚くべきことに、パッチベースの機能が最高のパフォーマンスを発揮し、パッチの解像度が小さいほどオブジェクトの幻覚が大幅に減少することがわかりました。第三に、さまざまな VLP 目標を分離し、対象の幻覚を軽減する効果を実証します。それに基づいて、オブジェクトの幻覚をさらに減らすために、新しい事前トレーニング損失、オブジェクトマスク言語モデリングを提案します。改善された CHAIR メトリクスを使用して、COCO (ドメイン内) と NoCaps (ドメイン外) の両方のデータセットでモデルを評価します。さらに、オブジェクトの幻覚に対するさまざまなテキストデコード戦略と画像増強方法の影響を調査します。

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we exhaustively probe the object hallucination problem from three aspects. First, we examine various state-of-the-art VLP models, showing that models achieving better scores on standard metrics(e.g., BLEU-4, CIDEr) could hallucinate objects more frequently. Second, we investigate how different types of visual features in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate their effectiveness in alleviating object hallucination. Based on that, we propose a new pre-training loss, object masked language modeling, to further reduce object hallucination. We evaluate models on both COCO (in-domain) and NoCaps (out-of-domain) datasets with our improved CHAIR metric. Furthermore, we investigate the effects of various text decoding strategies and image augmentation methods on object hallucination.

updated: Fri Oct 14 2022 10:27:22 GMT+0000 (UTC)

published: Fri Oct 14 2022 10:27:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト