Domain-robust VQA with diverse datasets and methods but no target labels

Mingda Zhang; Tristan Maidment; Ahmad Diab; Adriana Kovashka; Rebecca Hwa

多様なデータセットとメソッドを備えているが、ターゲットラベルがないドメインロバストなVQA

コンピュータビジョン手法がデータセットの詳細に適合しているという観察は、オブジェクト認識モデルをドメインシフトに対して堅牢にするためのさまざまな試みに影響を与えました。ただし、ドメインロバストな視覚的な質問応答方法に関する同様の作業は非常に限られています。 VQAのドメイン適応は、複雑さが増すため、オブジェクト認識の適応とは異なります。VQAモデルはマルチモーダル入力を処理し、メソッドにはさまざまなモジュールを含む複数のステップが含まれているため、複雑な最適化が行われ、異なるデータセットの回答スペースは大きく異なります。これらの課題に取り組むために、まず、視覚空間とテキスト空間の両方で、人気のあるVQAデータセット間のドメインシフトを定量化します。異なるモダリティから生じるデータセット間のシフトを解きほぐすために、画像ドメインと質問ドメインで別々に合成シフトを構築します。次に、これらのシフトに対するVQAメソッドのさまざまなファミリ（従来の2ストリーム、トランスフォーマー、およびニューロシンボリックメソッド）の堅牢性をテストします。第3に、既存のドメイン適応方法の適用性をテストし、特定のVQAモデルに合わせて調整されたVQAドメインギャップを埋めるための新しい方法を考案します。実世界の一般化の設定をエミュレートするために、教師なしドメインの適応とオープンエンドの分類タスクの定式化に焦点を当てます。

The observation that computer vision methods overfit to dataset specifics has inspired diverse attempts to make object recognition models robust to domain shifts. However, similar work on domain-robust visual question answering methods is very limited. Domain adaptation for VQA differs from adaptation for object recognition due to additional complexity: VQA models handle multimodal inputs, methods contain multiple steps with diverse modules resulting in complex optimization, and answer spaces in different datasets are vastly different. To tackle these challenges, we first quantify domain shifts between popular VQA datasets, in both visual and textual space. To disentangle shifts between datasets arising from different modalities, we also construct synthetic shifts in the image and question domains separately. Second, we test the robustness of different families of VQA methods (classic two-stream, transformer, and neuro-symbolic methods) to these shifts. Third, we test the applicability of existing domain adaptation methods and devise a new one to bridge VQA domain gaps, adjusted to specific VQA models. To emulate the setting of real-world generalization, we focus on unsupervised domain adaptation and the open-ended classification task formulation.

updated: Mon Mar 29 2021 22:24:50 GMT+0000 (UTC)

published: Mon Mar 29 2021 22:24:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト