Deep RNN Framework for Visual Sequential Applications

Bo Pang; Kaiwen Zha; Hanwen Cao; Chen Shi; Cewu Lu

ビジュアルシーケンシャルアプリケーション用のディープRNNフレームワーク

時間的および表現的特徴を効率的に抽出することは、視覚的なシーケンス情報を理解する上で極めて重要な役割を果たします。これに対処するために、効果的に深くスタックできる新しいリカレントニューラルフレームワークを提案します。ディープRNNフレームワークには、主に2つの斬新なデザインがあります。1つは、コンテキストブリッジモジュール（CBM）と呼ばれる新しいRNNモジュールで、シーケンス（時間方向）と深度（空間表現方向）に沿って流れる情報を分割し、これらの2つの方向のバランスをとることにより、深く構築するときにトレーニングします。もう1つはオーバーラップコヒーレンストレーニングスキームです。これは、コンピューティングリソースの制限により、長い視覚的な連続タスクのトレーニングの複雑さを軽減します。経験豊富なエビデンスを提供して、ディープRNNフレームワークは最適化が容易であり、いくつかの視覚的なシーケンスの問題について深さを増すことで精度を向上できることを示しています。これらのタスクで、従来のRNNネットワークよりも7 *の15層の深いRNNフレームワークを評価しますが、トレーニングは簡単です。私たちの深いフレームワークは、ビデオ分類のために、Kinetics、UCF-101、およびHMDB-51の浅いRNNモデルに対して11％以上の相対的な改善を達成しています。補助注釈については、Polygon-RNNの浅いRNN部分を15層の深いCBMに置き換えた後、パフォーマンスは14.7％向上します。ビデオの将来予測では、ディープRNNがPSNRおよびSSIMで最先端のシャローモデルのパフォーマンスを2.4％向上させます。コードとトレーニング済みモデルは、このペーパー（https://github.com/BoPang1996/Deep-RNN-Framework）とともに公開されています。

Extracting temporal and representation features efficiently plays a pivotal role in understanding visual sequence information. To deal with this, we propose a new recurrent neural framework that can be stacked deep effectively. There are mainly two novel designs in our deep RNN framework: one is a new RNN module called Context Bridge Module (CBM) which splits the information flowing along the sequence (temporal direction) and along depth (spatial representation direction), making it easier to train when building deep by balancing these two directions; the other is the Overlap Coherence Training Scheme that reduces the training complexity for long visual sequential tasks on account of the limitation of computing resources. We provide empirical evidence to show that our deep RNN framework is easy to optimize and can gain accuracy from the increased depth on several visual sequence problems. On these tasks, we evaluate our deep RNN framework with 15 layers, 7* than conventional RNN networks, but it is still easy to train. Our deep framework achieves more than 11% relative improvements over shallow RNN models on Kinetics, UCF-101, and HMDB-51 for video classification. For auxiliary annotation, after replacing the shallow RNN part of Polygon-RNN with our 15-layer deep CBM, the performance improves by 14.7%. For video future prediction, our deep RNN improves the state-of-the-art shallow model's performance by 2.4% on PSNR and SSIM. The code and trained models are published accompanied by this paper: https://github.com/BoPang1996/Deep-RNN-Framework.

updated: Fri Oct 25 2019 03:55:16 GMT+0000 (UTC)

published: Sun Nov 25 2018 06:34:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト