Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images

Zhenghang Yuan; Lichao Mou; Xiao Xiang Zhu

リモートセンシング画像におけるロバストな視覚的質問応答のための多言語拡張

リモートセンシング画像の内容に基づいて質問に回答することを目的とした、リモートセンシングデータの視覚的質問応答 (RSVQA) が最近注目されています。ただし、RSVQA の以前の作業は、RSVQA の堅牢性にほとんど焦点を当てていませんでした。 RSVQA モデルの信頼性を高めることを目指しているため、同じ意味を持つ新しい単語やさまざまな質問テンプレートに対して堅牢な表現を学習する方法が重要な課題です。提案された拡張データセットを使用すると、同じ意味を持つ元の質問に加えて、より多くの質問を取得できます。この情報をより有効に活用するために、この研究では、多様な質問テンプレートと単語に対して堅牢な RSVQA モデルをトレーニングするための対照的な学習戦略を提案します。実験結果は、提案された拡張データセットが RSVQA モデルの堅牢性の向上に効果的であることを示しています。さらに、対照学習戦略は、低解像度 (LR) データセットでうまく機能します。

Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.

updated: Fri Apr 07 2023 21:06:58 GMT+0000 (UTC)

published: Fri Apr 07 2023 21:06:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト