CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Tatiana Likhomanenko; Qiantong Xu; Ronan Collobert; Gabriel Synnaeve; Alex Rogozhnikov

CAPE: 継続的な位置埋め込みを使用した相対位置のエンコード

位置情報がなければ、アテンションベースのトランスフォーマーニューラルネットワークは順列不変です。絶対的または相対的な位置埋め込みは、トランスモデルの位置情報を提供する最も一般的な方法です。絶対位置埋め込みは実装が簡単ですが、トレーニング時に見られるものとは異なる長さのシーケンスで評価する場合、一般化の問題が発生します。相対位置は長さの変化に対してより堅牢ですが、実装がより複雑になり、モデルのスループットが低下します。この論文では、絶対位置埋め込みのための拡張ベースのアプローチ (CAPE) を提案します。これは、絶対 (単純さと速度) と相対位置埋め込み (より良い一般化) の両方の利点を保持します。さらに、機械翻訳、画像、音声認識の最先端モデルに関する私たちの経験的評価は、CAPE がハイパーパラメーターのトレーニングに関してより優れた一般化パフォーマンスと安定性の向上につながることを示しています。

Without positional information, attention-based transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed transformer models positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences of different length than those seen at training time. Relative positions are more robust to length change, but are more complex to implement and yield inferior model throughput. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative position embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.

updated: Sun Jun 06 2021 14:54:55 GMT+0000 (UTC)

published: Sun Jun 06 2021 14:54:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト