Sequence to Sequence Learning with Neural Networks

Ilya Sutskever; Oriol Vinyals; Quoc V. Le

ニューラルネットワークを使用したシーケンス間学習

ディープニューラルネットワーク（DNN）は、困難な学習タスクで優れたパフォーマンスを達成した強力なモデルです。 DNNは、ラベル付きの大きなトレーニングセットが利用可能な場合は常に機能しますが、シーケンスへのシーケンスのマッピングには使用できません。このホワイトペーパーでは、シーケンス構造に関する最小限の仮定を行う、シーケンス学習への一般的なエンドツーエンドのアプローチを示します。この方法では、多層化されたLong Short-Term Memory（LSTM）を使用して入力シーケンスを固定次元のベクターにマッピングし、次に別のディープLSTMを使用してベクターからターゲットシーケンスをデコードします。私たちの主な結果は、WMT'14データセットからの英語からフランス語への翻訳タスクで、LSTMによって生成された翻訳は、テストセット全体で34.8のBLEUスコアを達成します。言葉。さらに、LSTMは長い文章でも問題はありませんでした。比較のために、フレーズベースのSMTシステムは、同じデータセットで33.3のBLEUスコアを達成します。 LSTMを使用して、前述のSMTシステムによって生成された1000の仮説を再ランク付けすると、そのBLEUスコアは36.5に増加し、このタスクの以前の最良の結果に近くなります。また、LSTMは、語順に敏感で、能動的および受動的な音声に対して比較的不変である、賢明なフレーズおよび文の表現も学習しました。最後に、ソース文とターゲット文の間に短期間の依存関係が多く発生し、最適化の問題が容易になるため、すべてのソース文の単語の順序を逆にすることで、ターゲット文ではなくLSTMのパフォーマンスが著しく向上することがわかりました。

Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

updated: Sun Dec 14 2014 20:59:51 GMT+0000 (UTC)

published: Wed Sep 10 2014 19:55:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト