Understanding Attention for Vision-and-Language Tasks

Feiqi Cao; Soyeon Caren Han; Siqu Long; Changwei Xu; Josiah Poon

視覚と言語のタスクに対する注意を理解する

注意メカニズムは、視覚とテキストの特徴の間のセマンティックギャップを埋めるために、Vision-and-Language (VL) タスク全体で重要なコンポーネントとして使用されています。注意は VL タスクで広く使用されてきましたが、視覚的手がかりとテキストの手がかりの間のセマンティックギャップを埋める際のさまざまな注意アライメント計算の機能は検討されていません。この研究では、注意スコアの計算方法を調べて、注意アライメントの役割を理解するための包括的な分析を行い、グローバル評価に対する視覚領域とテキストトークンの重要性を実際にどのように表しているかを確認します。また、アテンションスコアの計算メカニズムが解釈しやすい (または解釈しにくい) 条件、および視覚的な質問応答、テキストから画像への生成、テキストと画像のマッチングを含む 3 つの異なる VL タスクでモデルのパフォーマンスに影響を与える可能性がある条件を分析します。 (文検索と画像検索の両方)。私たちの分析はこの種のものとしては初めてのものであり、VL タスクのトレーニングフェーズに適用すると、注意ベースのクロスモーダルモデルや事前トレーニング済みモデルでは一般的に無視される、各注意アライメントスコア計算の重要性に関する有用な洞察を提供します。コードは https://github.com/adlnlp/Attention_VL で入手できます。

Attention mechanism has been used as an important component across Vision-and-Language(VL) tasks in order to bridge the semantic gap between visual and textual features. While attention has been widely used in VL tasks, it has not been examined the capability of different attention alignment calculation in bridging the semantic gap between visual and textual clues. In this research, we conduct a comprehensive analysis on understanding the role of attention alignment by looking into the attention score calculation methods and check how it actually represents the visual region's and textual token's significance for the global assessment. We also analyse the conditions which attention score calculation mechanism would be more (or less) interpretable, and which may impact the model performance on three different VL tasks, including visual question answering, text-to-image generation, text-and-image matching (both sentence and image retrieval). Our analysis is the first of its kind and provides useful insights of the importance of each attention alignment score calculation when applied at the training phase of VL tasks, commonly ignored in attention-based cross modal models, and/or pretrained models. Our code is available at: https://github.com/adlnlp/Attention_VL

updated: Thu Sep 22 2022 06:24:44 GMT+0000 (UTC)

published: Wed Aug 17 2022 06:45:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト