Horizontal and Vertical Attention in Transformers

Litao Yu; Jian Zhang

変圧器の水平および垂直方向の注意

トランスフォーマーは、機能表現とトークンの依存関係を学習することを目的とした、マルチヘッドスケーリングされたドット積の注意と位置エンコードに基づいて構築されています。この作業では、トランスフォーマーの自己注意メカニズムを使用して機能マップを拡張することを学習することにより、特徴的な表現を強化することに焦点を当てます。具体的には、次元削減の前にスケーリングされた内積注意のマルチヘッド出力を再重み付けするための水平方向の注意を提案し、異なる間の相互依存性を明示的にモデル化することによってチャネルごとの特徴応答を適応的に再較正するための垂直方向の注意を提案しますチャネル。 2つの注意を備えたTransformerモデルは、さまざまな教師あり学習タスクにわたって高い一般化機能を備えており、計算コストのオーバーヘッドはごくわずかです。提案されている水平方向と垂直方向の注意は高度にモジュール化されており、さまざまなTransformerモデルに挿入してパフォーマンスをさらに向上させることができます。私たちのコードは補足資料で入手できます。

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by learning to augment the feature maps with the self-attention mechanism in Transformers. Specifically, we propose the horizontal attention to re-weight the multi-head output of the scaled dot-product attention before dimensionality reduction, and propose the vertical attention to adaptively re-calibrate channel-wise feature responses by explicitly modelling inter-dependencies among different channels. We demonstrate the Transformer models equipped with the two attentions have a high generalization capability across different supervised learning tasks, with a very minor additional computational cost overhead. The proposed horizontal and vertical attentions are highly modular, which can be inserted into various Transformer models to further improve the performance. Our code is available in the supplementary material.

updated: Sun Jul 10 2022 07:08:18 GMT+0000 (UTC)

published: Sun Jul 10 2022 07:08:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト