Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Tailin Liang; John Glossner; Lei Wang; Shaobo Shi; Xiaotong Zhang

ディープニューラルネットワークアクセラレーションのための剪定と量子化：調査

ディープニューラルネットワークは、コンピュータービジョンの分野で並外れた能力を発揮する多くのアプリケーションに適用されてきました。ただし、複雑なネットワークアーキテクチャは、効率的なリアルタイム展開に挑戦し、かなりの計算リソースとエネルギーコストを必要とします。これらの課題は、ネットワーク圧縮などの最適化によって克服できます。多くの場合、ネットワーク圧縮は精度をほとんど損なうことなく実現できます。場合によっては、精度が向上することさえあります。このホワイトペーパーでは、プルーニングと量子化という2種類のネットワーク圧縮について調査します。プルーニングは、オフラインで実行される場合は静的に、実行時に実行される場合は動的に分類できます。剪定手法を比較し、冗長な計算を削除するために使用される基準について説明します。要素ごと、チャネルごと、形状ごと、フィルターごと、レイヤーごと、さらにはネットワークごとのプルーニングにおけるトレードオフについて説明します。量子化は、データ型の精度を下げることによって計算を減らします。重み、バイアス、およびアクティブ化は、通常8ビット整数に量子化できますが、バイナリニューラルネットワークを含む、より低いビット幅の実装についても説明します。プルーニングと量子化の両方を個別に使用することも、組み合わせて使用することもできます。現在の手法を比較し、それらの長所と短所を分析し、圧縮されたネットワークの精度の結果をいくつかのフレームワークで提示し、ネットワークを圧縮するための実用的なガイダンスを提供します。

Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. Network compression can often be realized with little loss of accuracy. In some cases accuracy may even improve. This paper provides a survey on two types of network compression: pruning and quantization. Pruning can be categorized as static if it is performed offline or dynamic if it is performed at run-time. We compare pruning techniques and describe criteria used to remove redundant computations. We discuss trade-offs in element-wise, channel-wise, shape-wise, filter-wise, layer-wise and even network-wise pruning. Quantization reduces computations by reducing the precision of the datatype. Weights, biases, and activations may be quantized typically to 8-bit integers although lower bit width implementations are also discussed including binary neural networks. Both pruning and quantization can be used independently or combined. We compare current techniques, analyze their strengths and weaknesses, present compressed network accuracy results on a number of frameworks, and provide practical guidance for compressing networks.

updated: Tue Jun 15 2021 07:15:40 GMT+0000 (UTC)

published: Sun Jan 24 2021 08:21:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト