TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Shantanu Jaiswal; Basura Fernando; Cheston Tan

TDAM: CNN での文脈に応じて導かれる特徴選択のためのトップダウンの注意モジュール

畳み込みニューラルネットワーク (CNN) のアテンションモジュールは、複数のコンピュータービジョンタスクのパフォーマンスを向上させる効果的な方法です。既存の方法は、チャネル、空間、および自己注意を適切にモデル化しますが、主にフィードフォワードボトムアップ方式で動作します。その結果、アテンションメカニズムは、単一の入力フィーチャマップのローカル情報に強く依存し、下位レベルのフィーチャマップで「何をどこを見るか」を指定できる上位層で利用可能な比較的意味的に豊富なコンテキスト情報を取り込んでいません。ダウン情報フロー。したがって、この作業では、「ビジュアルサーチライト」を繰り返し生成してその入力のチャネルおよび空間変調を実行し、各計算ステップでよりコンテキストに関連する特徴マップを出力する軽量のトップダウンアテンションモジュール (TDAM) を提案します。私たちの実験は、TDAM が複数のオブジェクト認識ベンチマーク全体で CNN のパフォーマンスを向上させ、パラメータとメモリの効率を高めながら、著名な注意モジュールよりも優れていることを示しています。さらに、TDAM ベースのモデルは、明示的な監視なしで各計算ステップで個々のオブジェクトまたは機能をローカライズすることで「注意をシフト」することを学習し、その結果、ResNet50 では、弱い監視下のオブジェクトのローカリゼーションが 5% 改善されました。ソースコードとモデルは、https://github.com/shantanuj/TDAM_Top_down_attention_module で公開されています。

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance on multiple computer-vision tasks. While existing methods appropriately model channel-, spatial- and self-attention, they primarily operate in a feedforward bottom-up manner. Consequently, the attention mechanism strongly depends on the local information of a single input feature map and does not incorporate relatively semantically-richer contextual information available at higher layers that can specify "what and where to look" in lower-level feature maps through top-down information flow. Accordingly, in this work, we propose a lightweight top-down attention module (TDAM) that iteratively generates a "visual searchlight" to perform channel and spatial modulation of its inputs and outputs more contextually-relevant feature maps at each computation step. Our experiments indicate that TDAM enhances the performance of CNNs across multiple object-recognition benchmarks and outperforms prominent attention modules while being more parameter and memory efficient. Further, TDAM-based models learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision resulting in a 5% improvement for ResNet50 on weakly-supervised object localization. Source code and models are publicly available at: https://github.com/shantanuj/TDAM_Top_down_attention_module .

updated: Fri Oct 21 2022 08:07:53 GMT+0000 (UTC)

published: Fri Nov 26 2021 12:35:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト