Learnable Fourier Features for Multi-DimensionalSpatial Positional Encoding

Yang Li; Si Si; Gang Li; Cho-Jui Hsieh; Samy Bengio

多次元空間位置エンコーディングのための学習可能なフーリエ特徴

注意メカニズムは順序不変です。位置エンコーディングは、Transformer などのアテンションベースのディープモデルアーキテクチャが、情報の位置が重要なシーケンスや画像に対処できるようにするための重要なコンポーネントです。この論文では、学習可能なフーリエ特徴に基づいた新しい位置エンコーディング方法を提案します。各位置をトークンまたはベクトルとしてハードコーディングする代わりに、多次元の各位置を、多層パーセプトロンで変調された、学習可能なフーリエ特徴マッピングに基づく訓練可能なエンコーディングとして表します。この表現は、空間的多次元位置、たとえば、L_2 距離またはより複雑な位置関係をキャプチャする必要がある画像上のピクセル位置に特に有利です。いくつかの公開ベンチマークタスクに基づいた私たちの実験は、多次元位置エンコーディングの学習可能なフーリエ特徴表現が、精度の向上とより高速な収束の両方で既存の方法よりも優れていることを示しています。

Attentional mechanisms are order-invariant. Positional encoding is a crucial component to allow attention-based deep model architectures such as Transformer to address sequences or images where the position of information matters. In this paper, we propose a novel positional encoding method based on learnable Fourier features. Instead of hard-coding each position as a token or a vector, we represent each position, which can be multi-dimensional, as a trainable encoding based on learnable Fourier feature mapping, modulated with a multi-layer perceptron. The representation is particularly advantageous for a spatial multi-dimensional position, e.g., pixel positions on an image, where L_2 distances or more complex positional relationships need to be captured. Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.

updated: Sat Jun 05 2021 04:40:18 GMT+0000 (UTC)

published: Sat Jun 05 2021 04:40:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト