Retrieval Augmentation to Improve Robustness and Interpretability of Deep Neural Networks

Rita Parada Ramos; Patrícia Pereira; Helena Moniz; Joao Paulo Carvalho; Bruno Martins

ディープニューラルネットワークのロバスト性と解釈可能性を改善するための検索拡張

ディープニューラルネットワークモデルは、視覚や言語に関連するさまざまなタスクで最先端の結果を達成しています。大規模なトレーニングデータを使用しているにもかかわらず、ほとんどのモデルは、単一の入出力ペアを反復処理してトレーニングされ、現在の予測の残りの例は破棄されます。この作業では、トレーニングデータを積極的に活用して、ディープニューラルネットワークの堅牢性と解釈可能性を向上させます。最も近いトレーニング例からの情報を使用して、トレーニングとテストの両方で予測を支援します。具体的には、提案されたアプローチは、最も近い入力例のターゲットを使用して、LSTMモデルのメモリ状態を初期化するか、注意メカニズムをガイドします。このアプローチを画像のキャプションと感情分析に適用し、画像とテキストの両方の検索で実験を行います。結果は、広く使用されているFlickr8データセットとIMDBデータセットで、それぞれ2つのタスクに対して提案されたモデルの有効性を示しています。私たちのコードはhttp://github.com/RitaRamo/retrieval-augmentation-nnで公開されています。

Deep neural network models have achieved state-of-the-art results in various tasks related to vision and/or language. Despite the use of large training data, most models are trained by iterating over single input-output pairs, discarding the remaining examples for the current prediction. In this work, we actively exploit the training data to improve the robustness and interpretability of deep neural networks, using the information from nearest training examples to aid the prediction both during training and testing. Specifically, the proposed approach uses the target of the nearest input example to initialize the memory state of an LSTM model or to guide attention mechanisms. We apply this approach to image captioning and sentiment analysis, conducting experiments with both image and text retrieval. Results show the effectiveness of the proposed models for the two tasks, on the widely used Flickr8 and IMDB datasets, respectively. Our code is publicly available http://github.com/RitaRamo/retrieval-augmentation-nn.

updated: Thu Feb 25 2021 17:38:31 GMT+0000 (UTC)

published: Thu Feb 25 2021 17:38:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト