Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

Sia Huat Tan; Runpei Dong; Kaisheng Ma

マルチグリンプスネットワーク：繰り返しダウンサンプリングされた注意に基づく堅牢で効率的な分類アーキテクチャ

ほとんどのフィードフォワード畳み込みニューラルネットワークは、各ピクセルにほぼ同じ労力を費やします。しかし、人間の視覚認識は、目の動きと空間的注意の間の相互作用であり、さまざまな領域にあるオブジェクトをいくつか垣間見ることができます。この観察に触発されて、我々は、反復的なダウンサンプリングされた注意メカニズムに基づいて、高度な計算とロバスト性の欠如の課題に取り組むことを目的とした、エンドツーエンドのトレーニング可能なマルチグリンプスネットワーク（MGNet）を提案します。具体的には、MGNetは、焦点を合わせる画像のタスク関連領域を順次選択し、収集されたすべての情報を適応的に組み合わせて最終的な予測を行います。 MGNetは、より少ない計算で、敵対的な攻撃や一般的な破損に対して強い抵抗を示します。また、MGNetは、各反復中にどこに焦点を合わせるかを明示的に通知するため、本質的により解釈しやすくなります。 ImageNet100での実験は、単一のフィードフォワード方式を改善するための反復的なダウンサンプリングされた注意メカニズムの可能性を示しています。たとえば、MGNetは、わずか36.9％の計算コストで、一般的な破損で平均4.76％の精度を向上させます。さらに、ベースラインの精度は7.6％に低下しますが、MGNetは、ResNet-50バックボーンと同じPGD攻撃強度で44.2％の精度を維持することができます。私たちのコードはhttps://github.com/siahuat0727/MGNetで入手できます。

Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism. Specifically, MGNet sequentially selects task-relevant regions of an image to focus on and then adaptively combines all collected information for the final prediction. MGNet expresses strong resistance against adversarial attacks and common corruptions with less computation. Also, MGNet is inherently more interpretable as it explicitly informs us where it focuses during each iteration. Our experiments on ImageNet100 demonstrate the potential of recurrent downsampled attention mechanisms to improve a single feedforward manner. For example, MGNet improves 4.76% accuracy on average in common corruptions with only 36.9% computational cost. Moreover, while the baseline incurs an accuracy drop to 7.6%, MGNet manages to maintain 44.2% accuracy in the same PGD attack strength with ResNet-50 backbone. Our code is available at https://github.com/siahuat0727/MGNet.

updated: Wed Nov 03 2021 04:46:26 GMT+0000 (UTC)

published: Wed Nov 03 2021 04:46:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト