Linear Complexity Randomized Self-attention Mechanism

Lin Zheng; Chong Wang; Lingpeng Kong

線形複雑度ランダム化自己注意メカニズム

最近、指数カーネルを線形化することにより、線形時間と空間の複雑さにおけるソフトマックス注意を近似するために、ランダム特徴注意（RFA）が提案されています。この論文では、最初に、RFAを自己正規化された重要度サンプラーとして再キャストすることにより、このような近似のバイアスを理解するための新しい視点を提案します。この視点は、ランダム化された注意（RA）と呼ばれるソフトマックス注意全体の偏りのない推定量にさらに光を当てます。 RAは、クエリ固有の分布を介して正のランダムフィーチャを構築し、2次の複雑さを示しますが、近似の忠実度が大幅に向上します。 RAの表現力とRFAの効率を組み合わせることにより、線形ランダム化注意（LARA）と呼ばれる新しい線形複雑性自己注意メカニズムを開発します。さまざまなドメインにわたる広範な実験により、RAとLARAがRFAのパフォーマンスを大幅に向上させることが実証されています。

Recently, random feature attentions (RFAs) are proposed to approximate the softmax attention in linear time and space complexity by linearizing the exponential kernel. In this paper, we first propose a novel perspective to understand the bias in such approximation by recasting RFAs as self-normalized importance samplers. This perspective further sheds light on an unbiased estimator for the whole softmax attention, called randomized attention (RA). RA constructs positive random features via query-specific distributions and enjoys greatly improved approximation fidelity, albeit exhibiting quadratic complexity. By combining the expressiveness in RA and the efficiency in RFA, we develop a novel linear complexity self-attention mechanism called linear randomized attention (LARA). Extensive experiments across various domains demonstrate that RA and LARA significantly improve the performance of RFAs by a substantial margin.

updated: Sun Apr 10 2022 12:10:28 GMT+0000 (UTC)

published: Sun Apr 10 2022 12:10:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト