Deep Independently Recurrent Neural Network (IndRNN)

Shuai Li; Wanqing Li; Chris Cook; Yanbo Gao

深層独立型リカレントニューラルネットワーク (IndRNN)

リカレントニューラルネットワーク(RNN)は、勾配消失問題や爆発問題のために学習が難しく、長期的なパターンの学習や深いネットワークの構築が困難であることが知られている。これらの問題を解決するために、本論文では、再帰接続をアダマール積として定式化した新しいタイプのRNNを提案する。これは独立リカレントニューラルネットワーク (IndRNN)と呼ばれ、同じ層のニューロンが互いに独立して層を越えて接続される。勾配バックプロパゲーションにより、再帰重みを調整したIndRNNは、勾配消失や爆発の問題を効果的に解決し、長期的な依存関係を学習することができる。さらに、IndRNNは、ReLU(整流型リニアユニット)のような飽和していない活性化関数でも動作し、ロバストに学習することができる。基本積層型IndRNN、残差型IndRNN、密結合型IndRNNなど、さまざまな深いIndRNNアーキテクチャが研究されており、いずれも既存のRNNよりもはるかに深く学習することができる。さらに、IndRNNは、各時間ステップでの計算を削減し、一般的に使用されている長短期記憶(LSTM)よりも10倍以上高速化することができる。実験の結果、提案されたIndRNNは、非常に長いシーケンスを処理し、非常に深いネットワークを構築できることがわかった。従来のRNN、LSTM、および一般的なトランスフォーマーと比較して、IndRNNを用いた様々なタスクで優れた性能が達成されている。

Recurrent neural networks (RNNs) are known to be difficult to train due to the gradient vanishing and exploding problems and thus difficult to learn long-term patterns and construct deep networks. To address these problems, this paper proposes a new type of RNNs with the recurrent connection formulated as Hadamard product, referred to as independently recurrent neural network (IndRNN), where neurons in the same layer are independent of each other and connected across layers. Due to the better behaved gradient backpropagation, IndRNN with regulated recurrent weights effectively addresses the gradient vanishing and exploding problems and thus long-term dependencies can be learned. Moreover, an IndRNN can work with non-saturated activation functions such as ReLU (rectified linear unit) and be still trained robustly. Different deeper IndRNN architectures, including the basic stacked IndRNN, residual IndRNN and densely connected IndRNN, have been investigated, all of which can be much deeper than the existing RNNs. Furthermore, IndRNN reduces the computation at each time step and can be over 10 times faster than the commonly used Long short-term memory (LSTM). Experimental results have shown that the proposed IndRNN is able to process very long sequences and construct very deep networks. Better performance has been achieved on various tasks with IndRNNs compared with the traditional RNN, LSTM and the popular Transformer.

updated: Wed Dec 09 2020 09:19:00 GMT+0000 (UTC)

published: Fri Oct 11 2019 09:43:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト