Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Sheng Lin; Wei Jiang; Wei Wang; Kaidi Xu; Yanzhi Wang; Shan Liu; Songnan Li

ニューラルネットワーク圧縮のための効率的な微細構造の重みの統一と剪定

ディープニューラルネットワーク（DNN）モデルを圧縮してストレージと計算の要件を緩和することは、実際のアプリケーション、特にリソースが限られたデバイスにとって不可欠です。モデルパラメータの妥当な量を減らすことはできますが、以前の非構造化または構造化ウェイトプルーニング方法は、非構造化スパース性のハードウェア互換性が低いか、構造的にプルーニングされたネットワークのスパース率が低いため、推論を真に加速することはほとんどできません。ストレージと計算の両方を削減し、元のタスクのパフォーマンスを維持することを目的として、ハードウェア互換の微細構造レベルで一般化された重量統一フレームワークを提案し、大量の圧縮と加速を実現します。選択した微細構造ブロックの重み係数は、ニューロン接続を変更せずにブロックの保存と計算を減らすために統合されます。これは、すべての統合係数がゼロに設定されている場合、ニューロン接続が微細構造プルーニングの特殊なケースになります（したがって、ストレージと計算）は完全に削除されます。さらに、乗数の交互方向法（ADMM）に基づく効果的なトレーニングフレームワークを開発しました。これにより、複雑な制約付き最適化が個別に解決可能なサブ問題に変換されます。サブ問題を繰り返し最適化することにより、高い圧縮率と低いパフォーマンス低下で、目的の微細構造を確保できます。さまざまなアプリケーションのさまざまなベンチマークモデルとデータセットを使用して、メソッドを広範囲に評価しました。実験結果は、最先端のパフォーマンスを示しています。

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model parameters, previous unstructured or structured weight pruning methods can hardly truly accelerate inference, either due to the poor hardware compatibility of the unstructured sparsity or due to the low sparse rate of the structurally pruned network. Aiming at reducing both storage and computation, as well as preserving the original task performance, we propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration. Weight coefficients of a selected micro-structured block are unified to reduce the storage and computation of the block without changing the neuron connections, which turns to a micro-structured pruning special case when all unified coefficients are set to zero, where neuron connections (hence storage and computation) are completely removed. In addition, we developed an effective training framework based on the alternating direction method of multipliers (ADMM), which converts our complex constrained optimization into separately solvable subproblems. Through iteratively optimizing the subproblems, the desired micro-structure can be ensured with high compression ratio and low performance degradation. We extensively evaluated our method using a variety of benchmark models and datasets for different applications. Experimental results demonstrate state-of-the-art performance.

updated: Wed Jun 16 2021 16:43:08 GMT+0000 (UTC)

published: Tue Jun 15 2021 17:22:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト