Convolution-enhanced Evolving Attention Networks

Yujing Wang; Yaming Yang; Zhuo Li; Jiangang Bai; Mingliang Zhang; Xiangtai Li; Jing Yu; Ce Zhang; Gao Huang; Yunhai Tong

畳み込み強化された進化する注意ネットワーク

Transformers などのアテンションベースのニューラルネットワークは、コンピュータービジョン、自然言語処理、時系列分析など、さまざまなアプリケーションで普及しています。あらゆる種類のアテンションネットワークにおいて、アテンションマップは入力トークン間のセマンティックな依存関係をエンコードするため、非常に重要です。ただし、ほとんどの既存の注意ネットワークは、表現に基づいてモデリングまたは推論を実行し、異なるレイヤーの注意マップは明示的な相互作用なしで個別に学習されます。この論文では、残余畳み込みモジュールのチェーンを介してトークン間の関係の進化を直接モデル化する、新規で一般的な進化する注意メカニズムを提案します。主な動機は 2 つあります。一方で、異なるレイヤーのアテンションマップは転送可能な知識を共有するため、残りの接続を追加することで、レイヤー間のトークン間の関係の情報の流れを促進できます。一方、異なる抽象化レベルのアテンションマップには当然ながら進化の傾向があるため、専用の畳み込みベースのモジュールを利用してこのプロセスをキャプチャすることは有益です。提案されたメカニズムを搭載した畳み込み強化された進化的注意ネットワークは、時系列表現、自然言語理解、機械翻訳、画像分類など、さまざまなアプリケーションで優れたパフォーマンスを実現します。特に時系列表現タスクでは、Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer は最先端のモデルを大幅に上回り、最高の SOTA と比較して平均 17% の改善を達成しています。私たちの知る限り、これはアテンションマップの層ごとの進化を明示的にモデル化した最初の作業です。私たちの実装は、https://github.com/pkuyym/EvolvingAttention で入手できます。

Attention-based neural networks, such as Transformers, have become ubiquitous in numerous applications, including computer vision, natural language processing, and time-series analysis. In all kinds of attention networks, the attention maps are crucial as they encode semantic dependencies between input tokens. However, most existing attention networks perform modeling or reasoning based on representations , wherein the attention maps of different layers are learned separately without explicit interactions. In this paper, we propose a novel and generic evolving attention mechanism, which directly models the evolution of inter-token relationships through a chain of residual convolutional modules. The major motivations are twofold. On the one hand, the attention maps in different layers share transferable knowledge, thus adding a residual connection can facilitate the information flow of inter-token relationships across layers. On the other hand, there is naturally an evolutionary trend among attention maps at different abstraction levels, so it is beneficial to exploit a dedicated convolution-based module to capture this process. Equipped with the proposed mechanism, the convolution-enhanced evolving attention networks achieve superior performance in various applications, including time-series representation, natural language understanding, machine translation, and image classification. Especially on time-series representation tasks, Evolving Attention-enhanced Dilated Convolutional (EA-DC-) Transformer outperforms state-of-the-art models significantly, achieving an average of 17% improvement compared to the best SOTA. To the best of our knowledge, this is the first work that explicitly models the layer-wise evolution of attention maps. Our implementation is available at https://github.com/pkuyym/EvolvingAttention.

updated: Fri Apr 28 2023 06:44:48 GMT+0000 (UTC)

published: Fri Dec 16 2022 08:14:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト