Supervising the Transfer of Reasoning Patterns in VQA

Corentin Kervadec; Christian Wolf; Grigory Antipov; Moez Baccouche; Madiha Nadri

VQAでの推論パターンの転送の監視

Visual Question Anwering（VQA）の方法は、推論を実行するのではなく、データセットのバイアスを活用して一般化を妨げることで有名です。最近、完全な（オラクル）視覚入力でトレーニングされると、最先端のVQAモデルの注意層に優れた推論パターンが現れることが示されました。これは、トレーニング条件が十分に良好な場合、ディープニューラルネットワークが推論を学習できるという証拠を提供します。ただし、この学習した知識を展開可能なモデルに転送することは困難です。転送中にその多くが失われるためです。損失関数の正則化項に基づいて、必要な推論操作のシーケンスを監視する知識伝達の方法を提案します。 PAC学習に基づく理論的分析を提供し、そのようなプログラム予測が穏やかな仮説の下でサンプルの複雑さの減少につながる可能性があることを示します。また、GQAデータセットでこのアプローチの有効性を実験的に示し、BERTのような自己教師あり事前トレーニングに対する補完性を示します。

Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning, hindering generalization. It has been recently shown that better reasoning patterns emerge in attention layers of a state-of-the-art VQA model when they are trained on perfect (oracle) visual inputs. This provides evidence that deep neural networks can learn to reason when training conditions are favorable enough. However, transferring this learned knowledge to deployable models is a challenge, as much of it is lost during the transfer. We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations. We provide a theoretical analysis based on PAC-learning, showing that such program prediction can lead to decreased sample complexity under mild hypotheses. We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complementarity to BERT-like self-supervised pre-training.

updated: Thu Jun 10 2021 08:58:43 GMT+0000 (UTC)

published: Thu Jun 10 2021 08:58:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト