Longformer: The Long-Document Transformer

Iz Beltagy; Matthew E. Peters; Arman Cohan

Longformer: 長い文書のトランスフォーマー

トランスフォーマーを用いたモデルでは、自己注意動作がシーケンスの長さに応じて2次的にスケーリングするため、長いシーケンスを処理することができない。この問題を解決するため、数千トークン以上の文書を容易に処理できるように、シーケンスの長さに応じて直線的にスケーリングするアテンション機構を備えたLongformerを導入した。Longformerの注意機構は、標準的な自己注意に代わるドロップイン型の注意機構であり、局所的な窓付きの注意と、タスクを動機としたグローバルな注意を組み合わせたものである。長文連続トランスフォーマーの先行研究に続き、文字レベルの言語モデリングでLongformerを評価し、text8とenwik8で最先端の結果を達成した。ほとんどの先行研究とは対照的に、我々はまた、Longformerを事前に訓練し、様々な下流のタスクでそれを微調整した。我々の事前訓練済みLongformerは、長い文書タスクにおいてRoBERTaを一貫して凌駕し、WikiHopとTriviaQAにおいて新たな最先端の結果を達成した。

Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.

updated: Fri Apr 10 2020 17:54:09 GMT+0000 (UTC)

published: Fri Apr 10 2020 17:54:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト