Sparse and Continuous Attention Mechanisms

André F. T. Martins; Marcos Treviso; António Farinhas; Vlad Niculae; Mário A. T. Figueiredo; Pedro M. Q. Aguiar

希薄で継続的な注意メカニズム

指数ファミリは、機械学習で広く使用されています。それらには、連続および離散ドメインの多くの分布が含まれます（たとえば、ガウス分布、ディリクレ分布、ポアソン分布、およびソフトマックス変換によるカテゴリカル分布）。これらの各ファミリのディストリビューションは、サポートが固定されています。対照的に、有限ドメインの場合、サポートが異なり、関連性のないカテゴリにゼロ確率を割り当てることができる、softmaxのスパース代替（sparsemaxやalpha-entmaxなど）に関する最近の作業があります。このペーパーは、2つの方向でその機能を拡張します。まず、alpha-entmaxを連続ドメインに拡張し、Tsallis統計と変形指数ファミリーとのリンクを明らかにします。 2番目に、連続領域の注意メカニズムを導入し、1,2のアルファの効率的な勾配逆伝播アルゴリズムを導出します。注意に基づくテキスト分類、機械翻訳、視覚的な質問応答に関する実験は、1Dおよび2Dでの継続的な注意の使用を示し、時間間隔とコンパクトな領域に対応できることを示しています。

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and alpha-entmax), which have varying support, being able to assign zero probability to irrelevant categories. This paper expands that work in two directions: first, we extend alpha-entmax to continuous domains, revealing a link with Tsallis statistics and deformed exponential families. Second, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for alpha in 1,2. Experiments on attention-based text classification, machine translation, and visual question answering illustrate the use of continuous attention in 1D and 2D, showing that it allows attending to time intervals and compact regions.

updated: Tue Oct 27 2020 22:22:38 GMT+0000 (UTC)

published: Fri Jun 12 2020 14:16:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト