Linformer: Self-Attention with Linear Complexity

Sinong Wang; Belinda Z. Li; Madian Khabsa; Han Fang; Hao Ma

Linformer: 線形な計算量の自己注意

大規模なトランスフォーマーモデルは、多くの自然言語処理アプリケーションにおいて最先端の結果を達成するために並外れた成功を示してきた。しかし、トランスフォーマーの標準的な自己注意メカニズムは、シーケンスの長さに関してO(n^2)の時間と空間を使用するため、これらのモデルの訓練と展開は、長いシーケンスに対して法外なコストがかかることがある。この論文では、自己注意メカニズムが低ランク行列で近似できることを実証する。さらに、この発見を利用して、時間的にも空間的にも、全体的な自己注意の複雑さをO(n^2)からO(n)に低減する新しい自己注意メカニズムを提案する。このようにして得られた線形トランスフォーマーLinformerは、標準的なトランスフォーマーモデルと同等の性能を持ち、メモリと時間効率が大幅に向上している。

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses O(n^2) time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n^2) to O(n) in both time and space. The resulting linear transformer, the Linformer, performs on par with standard Transformer models, while being much more memory- and time-efficient.

updated: Tue Jun 09 2020 03:03:56 GMT+0000 (UTC)

published: Mon Jun 08 2020 17:37:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト