PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

Xiaolong Ma; Fu-Ming Guo; Wei Niu; Xue Lin; Jian Tang; Kaisheng Ma; Bin Ren; Yanzhi Wang

PCONV：モバイルデバイスでのリアルタイム実行のためのDNNウェイトプルーニングの欠落しているが望ましいスパース性

ディープニューラルネットワーク（DNN）のモデル圧縮技術は、さまざまなプラットフォームで高速化を実現する効果的な方法として広く認識されており、DNNウェイトプルーニングは簡単で効果的な方法です。現在、枝刈りの規則性の両極端を表す枝刈り方法の2つの主流があります。構造化された粗粒度のプルーニングは、プルーニングでハードウェア効率の良い構造を活用しますが、プルーニング率が高いと精度が低下します。この論文では、新しいスパース次元を含むPCONVを紹介します-粗視化構造内の細分化枝刈りパターン。 PCONVは、イントラコンボリューションカーネルプルーニングから生成されるスパースコンボリューションパターン（SCP）とインターコンボリューションカーネルプルーニングから生成される接続スパース性の2種類のスパース性で構成されます。基本的に、SCPはその特殊な視覚特性により精度を向上させ、接続のスパース性はプルーニングレートを増加させ、フィルター計算のバランスの取れたワークロードを維持します。 PCONVを展開するために、新しいコンパイラ支援DNN推論フレームワークを開発し、精度の妥協なしでリアルタイムでPCONVモデルを実行します。これは、以前の作業では達成できませんでした。実験結果は、PCONVが3つの最先端のエンドツーエンドDNNフレームワーク、TensorFlow-Lite、TVM、Alibaba Mobile Neural Networkをそれぞれ最大39.2倍、11.4倍、6.3倍高速化することを示しています。精度の損失はありません。モバイルデバイスは、大規模なDNNでリアルタイムの推論を実現できます。

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, -- fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2x, 11.4x, and 6.3x, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.

updated: Wed Mar 04 2020 19:39:06 GMT+0000 (UTC)

published: Fri Sep 06 2019 03:58:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト