Efficient Representation Learning via Adaptive Context Pooling

Chen Huang; Walter Talbott; Navdeep Jaitly; Josh Susskind

アダプティブコンテキストプーリングによる効率的な表現学習

自己注意メカニズムは、すべての入力トークン間でペアワイズ注意を使用することにより、長距離コンテキストをモデル化します。そうすることで、個々のトークン（たとえば、テキスト文字や画像ピクセル）によって定義される固定された注意の粒度を想定します。これは、より高いレベルで複雑な依存関係をモデル化するのに最適ではない場合があります。この論文では、各トークンの注意の粒度を適応させることにより、この問題に対処するためのContextPoolを提案します。プーリングと組み合わせて長距離の依存関係をキャプチャするConvNetの成功に触発されて、特定のアテンションレイヤーでアテンションを計算する前に、トークンごとに隣接する機能をプールする方法を学びます。プーリングの重みとサポートサイズは適応的に決定され、プールされた機能がさまざまなスケールで意味のあるコンテキストをエンコードできるようにします。 ContextPoolにより、アテンションモデルがより表現力豊かになり、多くの場合、より少ないレイヤーで強力なパフォーマンスが達成され、コストが大幅に削減されることを示します。実験では、ContextPoolモジュールをトランスフォーマーモデルに接続すると、いくつかの言語と画像のベンチマークでの計算が少なくて済み、最先端のパフォーマンスに匹敵するかそれを上回り、学習したコンテキストサイズやまばらな注意パターンで最近の作業を上回り、適用可能であることを検証します効率的な機能学習のためにConvNetsに。

Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningful context with varying scale. We show that ContextPool makes attention models more expressive, achieving strong performance often with fewer layers and thus significantly reduced cost. Experiments validate that our ContextPool module, when plugged into transformer models, matches or surpasses state-of-the-art performance using less compute on several language and image benchmarks, outperforms recent works with learned context sizes or sparse attention patterns, and is also applicable to ConvNets for efficient feature learning.

updated: Tue Jul 05 2022 07:10:31 GMT+0000 (UTC)

published: Tue Jul 05 2022 07:10:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト