CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Tatiana Likhomanenko; Qiantong Xu; Gabriel Synnaeve; Ronan Collobert; Alex Rogozhnikov

CAPE：連続的な拡張位置埋め込みによる相対位置のエンコード

位置情報がない場合、注意ベースのTransformerニューラルネットワークは順列不変です。絶対的または相対的な位置の埋め込みは、Transformerモデルに位置情報を提供するための最も一般的な方法です。絶対位置埋め込みは実装が簡単ですが、トレーニング時に見られるよりも長いシーケンスで評価する場合、一般化の問題が発生します。相対位置は入力長の変更に対してより堅牢ですが、実装がより複雑になり、計算コストとメモリコストが増えるため、モデルのスループットが低下します。この論文では、絶対位置埋め込み（単純さと速度）と相対位置埋め込み（より一般化）の両方の利点を維持する、絶対位置埋め込みの拡張ベースのアプローチ（CAPE）を提案します。さらに、機械翻訳、画像、音声認識の最先端モデルに関する経験的評価は、CAPEがハイパーパラメータのトレーニングに関してより優れた一般化パフォーマンスと安定性の向上につながることを示しています。

Without positional information, attention-based Transformer neural networks are permutation-invariant. Absolute or relative positional embeddings are the most popular ways to feed Transformer models with positional information. Absolute positional embeddings are simple to implement, but suffer from generalization issues when evaluating on sequences longer than seen at training time. Relative positions are more robust to input length change, but are more complex to implement and yield inferior model throughput due to extra computational and memory costs. In this paper, we propose an augmentation-based approach (CAPE) for absolute positional embeddings, which keeps the advantages of both absolute (simplicity and speed) and relative positional embeddings (better generalization). In addition, our empirical evaluation on state-of-the-art models in machine translation, image and speech recognition demonstrates that CAPE leads to better generalization performance as well as increased stability with respect to training hyper-parameters.

updated: Tue Nov 09 2021 03:03:27 GMT+0000 (UTC)

published: Sun Jun 06 2021 14:54:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト