A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami; Sehoon Kim; Zhen Dong; Zhewei Yao; Michael W. Mahoney; Kurt Keutzer

効率的なニューラルネットワーク推論のための量子化法の調査

抽象数学的計算がデジタルコンピュータでの計算に適応されるとすぐに、それらの計算における数値の効率的な表現、操作、および通信の問題が発生しました。数値表現の問題に強く関連しているのは、量子化の問題です。必要なビット数を最小限に抑え、精度を最大化するために、連続する実数値のセットを固定の離散的な数値のセットにどのように分散させる必要がありますか。アテンダント計算？この量子化の永続的な問題は、メモリや計算リソースが厳しく制限されている場合に特に関係があり、コンピュータビジョン、自然言語処理、および関連分野でのニューラルネットワークモデルの優れたパフォーマンスにより、近年最前線に立っています。浮動小数点表現から4ビット以下で表される低精度の固定整数値に移行すると、メモリフットプリントとレイテンシが16分の1に削減される可能性があります。実際、これらのアプリケーションでは、実際には4倍から8倍の削減が実現されることがよくあります。したがって、量子化がニューラルネットワークに関連する計算の効率的な実装における重要で非常に活発な研究のサブエリアとして最近浮上したことは驚くべきことではありません。この記事では、ディープニューラルネットワーク計算における数値の定量化の問題へのアプローチを調査し、現在の方法の長所/短所をカバーします。この調査とその組織により、ニューラルネットワークの量子化に関する現在の研究の有用なスナップショットを提示し、この分野の将来の研究の評価を容易にするインテリジェントな組織を提供したいと考えています。

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

updated: Thu Mar 25 2021 06:57:11 GMT+0000 (UTC)

published: Thu Mar 25 2021 06:57:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト