A Better Loss for Visual-Textual Grounding

Davide Rigoni; Luciano Serafini; Alessandro Sperduti

視覚的-テキスト的接地のためのより良い損失

テキストフレーズと画像が与えられた場合、視覚的根拠の問題は、文によって参照される画像のコンテンツを見つけるタスクとして定義されます。これは、人間とコンピューターの相互作用、画像とテキストの参照解像度、およびビデオとテキストの参照解像度にいくつかの実際のアプリケーションがある、やりがいのあるタスクです。過去数年間、いくつかの研究が、以前よりも視覚とテキストの依存関係をうまく捉えようとする重くて複雑なモデルでこの問題に取り組んできました。これらのモデルは通常、接地に役立つマルチモーダル機能を学習する方法と、視覚的な言及の予測されるバウンディングボックスを改善する方法にそれぞれ焦点を当てた2つの主要なコンポーネントで構成されています。これら2つのサブタスク間の適切な学習バランスを見つけることは容易ではなく、現在のモデルはこの問題に関して必ずしも最適ではありません。本研究では、シンプルなマルチモーダル特徴融合コンポーネントを使用しながら、より効果的な損失関数を採用することにより、最先端のモデルよりも高い精度を実現できるモデルを提案します。検討対象のデータセットで、上記の2つのサブタスク間の学習バランスが向上するクラスの確率。

Given a textual phrase and an image, the visual grounding problem is defined as the task of locating the content of the image referenced by the sentence. It is a challenging task that has several real-world applications in human-computer interaction, image-text reference resolution, and video-text reference resolution. In the last years, several works have addressed this problem with heavy and complex models that try to capture visual-textual dependencies better than before. These models are typically constituted by two main components that focus on how to learn useful multi-modal features for grounding and how to improve the predicted bounding box of the visual mention, respectively. Finding the right learning balance between these two sub-tasks is not easy, and the current models are not necessarily optimal with respect to this issue. In this work, we propose a model that, although using a simple multi-modal feature fusion component, is able to achieve a higher accuracy than state-of-the-art models thanks to the adoption of a more effective loss function, based on the classes probabilities, that reach, in the considered datasets, a better learning balance between the two sub-tasks mentioned above.

updated: Wed Aug 11 2021 16:26:54 GMT+0000 (UTC)

published: Wed Aug 11 2021 16:26:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト