Kronecker CP Decomposition with Fast Multiplication for Compressing RNNs

Dingheng Wang; Bijiao Wu; Guangshe Zhao; Man Yao; Hengnu Chen; Lei Deng; Tianyi Yan; Guoqi Li

RNNを圧縮するための高速乗算を使用したKronecker CP分解

再帰型ニューラルネットワーク（RNN）は、自然言語処理やビデオ認識など、シーケンシャルデータを指向するタスクで強力です。ただし、Long-Short Term Memory（LSTM）およびGateed Recurrent Unit（GRU）ネットワークを含む最近のRNNは、複雑なトポロジーと高価なスペース/計算の複雑さを持っているため、それらを圧縮することは近年ホットで有望なトピックになります。多数の圧縮方法の中で、テンソルトレイン（TT）、ブロック項（BT）、テンソルリング（TR）、階層的タッカー（HT）などのテンソル分解は、非常に高い圧縮率が得られた。それにもかかわらず、これらのテンソル分解形式のいずれも空間と計算効率の両方を提供できません。このホワイトペーパーでは、入力とテンソル分解された重みの間の乗算の2つの高速アルゴリズムを提案することにより、クロネッカーテンソル（KT）分解から派生した新しいクロネッカーCANDECOMP / PARAFAC（KCP）分解に基づいてRNNを圧縮することを検討します。 UCF11、Youtube Celebrities Face、およびUCF50データセットに基づく私たちの実験によれば、提案されたKCP-RNNは他のテンソル分解形式のそれに匹敵する精度のパフォーマンスを持っていることが確認でき、278,219xの圧縮率でも、低ランクKCP。さらに重要なことに、KCP-RNNは、類似のランクの下にある他のテンソル分解のものと比較して、空間と計算の両方の複雑さにおいて効率的です。さらに、KCPは並列計算でニューラルネットワークの計算を高速化するための最良の可能性を秘めています。

Recurrent neural networks (RNNs) are powerful in the tasks oriented to sequential data, such as natural language processing and video recognition. However, since the modern RNNs, including long-short term memory (LSTM) and gated recurrent unit (GRU) networks, have complex topologies and expensive space/computation complexity, compressing them becomes a hot and promising topic in recent years. Among plenty of compression methods, tensor decomposition, e.g., tensor train (TT), block term (BT), tensor ring (TR) and hierarchical Tucker (HT), appears to be the most amazing approach since a very high compression ratio might be obtained. Nevertheless, none of these tensor decomposition formats can provide both the space and computation efficiency. In this paper, we consider to compress RNNs based on a novel Kronecker CANDECOMP/PARAFAC (KCP) decomposition, which is derived from Kronecker tensor (KT) decomposition, by proposing two fast algorithms of multiplication between the input and the tensor-decomposed weight. According to our experiments based on UCF11, Youtube Celebrities Face and UCF50 datasets, it can be verified that the proposed KCP-RNNs have comparable performance of accuracy with those in other tensor-decomposed formats, and even 278,219x compression ratio could be obtained by the low rank KCP. More importantly, KCP-RNNs are efficient in both space and computation complexity compared with other tensor-decomposed ones under similar ranks. Besides, we find KCP has the best potential for parallel computing to accelerate the calculations in neural networks.

updated: Fri Sep 24 2021 12:19:16 GMT+0000 (UTC)

published: Fri Aug 21 2020 07:29:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト