Compact Model Training by Low-Rank Projection with Energy Transfer

Kailing Guo; Zhenquan Lin; Xiaofen Xing; Fang Liu; Xiangmin Xu

エネルギー伝達を伴う低ランク射影によるコンパクトモデルトレーニング

低ランクは、従来の機械学習では重要な役割を果たしますが、深層学習ではそれほど人気がありません。以前のほとんどの低ランクのネットワーク圧縮方法は、事前にトレーニングされたモデルを近似して再トレーニングすることにより、ネットワークを圧縮します。ただし、ユークリッド空間での最適解は、低ランク多様体での解とはかなり異なる場合があります。十分に事前にトレーニングされたモデルは、ランクの低い制約のあるモデルの初期化には適していません。したがって、低ランクの圧縮ネットワークのパフォーマンスは大幅に低下します。剪定などの他のネットワーク圧縮方法と比較して、近年、低ランクの方法はあまり注目されていません。本論文では、低ランクの圧縮ネットワークをゼロからトレーニングし、競争力のあるパフォーマンスを実現する、新しいトレーニング方法であるエネルギー伝達を伴う低ランク投影（LRPET）を考案します。まず、確率的勾配降下法のトレーニングと低ランク多様体への射影を交互に実行することを提案します。これは、低ランク多様体の最適解に漸近的に近づきます。コンパクトモデルでの再トレーニングと比較すると、投影後にソリューションスペースがユークリッド空間に緩和されるため、モデル容量を十分に活用できます。第二に、投影によって引き起こされるマトリックスエネルギー（特異値の二乗の合計）の減少は、エネルギー伝達によって補償されます。剪定された特異値のエネルギーを残りの値に均一に転送します。理論的には、エネルギー伝達が投影によって引き起こされる勾配消失の傾向を緩和することを示しています。 CIFAR-10とImageNetでの包括的な実験により、私たちの方法が他の低ランクの圧縮方法よりも優れており、最近の最先端の剪定方法よりも優れていることが正当化されました。

Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning. Most previous low-rank network compression methods compress the networks by approximating pre-trained models and re-training. However, optimal solution in the Euclidean space may be quite different from the one in the low-rank manifold. A well pre-trained model is not a good initialization for the model with low-rank constraint. Thus, the performance of low-rank compressed network degrades significantly. Compared to other network compression methods such as pruning, low-rank methods attracts less attention in recent years. In this paper, we devise a new training method, low-rank projection with energy transfer (LRPET), that trains low-rank compressed networks from scratch and achieves competitive performance. First, we propose to alternately perform stochastic gradient descent training and projection onto the low-rank manifold. This asymptotically approaches the optimal solution in the low-rank manifold. Compared to re-training on compact model, this enables fully utilization of model capacity since solution space is relaxed back to Euclidean space after projection. Second, the matrix energy (the sum of squares of singular values) reduction caused by projection is compensated by energy transfer. We uniformly transfer the energy of the pruned singular values to the remaining ones. We theoretically show that energy transfer eases the trend of gradient vanishing caused by projection. Comprehensive experiment on CIFAR-10 and ImageNet have justified that our method is superior to other low-rank compression methods and also outperforms recent state-of-the-art pruning methods.

updated: Tue Apr 12 2022 06:53:25 GMT+0000 (UTC)

published: Tue Apr 12 2022 06:53:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト