Pareto-Optimal Quantized ResNet Is Mostly 4-bit

AmirAli Abdolrashidi; Lisa Wang; Shivani Agrawal; Jonathan Malmaud; Oleg Rybakov; Chas Leichner; Lukasz Lew

パレート最適量子化ResNetはほとんど4ビットです

量子化は、ニューラルネットワークを圧縮して計算コストを削減するための一般的な手法になりましたが、これまでのほとんどの作業は、ネットワークサイズを変更せずに量子化を研究することに焦点を当てています。ニューラルネットワークの実際のアプリケーションの多くには、計算コストとメモリバジェットがあり、パラメータの数を変更することでモデルの品質とトレードオフできます。この作業では、ResNetをケーススタディとして使用して、推論計算のコストと品質のトレードオフ曲線に対する量子化の影響を体系的に調査します。私たちの結果は、各bfloat16 ResNetモデルについて、より低コストでより高い精度の量子化モデルがあることを示唆しています。言い換えると、bfloat16の計算コストと品質のトレードオフ曲線は、4ビットと8ビットの曲線がパレート図で支配されており、モデルは主に4ビットに量子化されて最良のパレート図が得られます。さらに、量子化対応トレーニングを使用して、4ビットResNet-50のImageNetで最先端の結果を達成し、77.09％のトップ1評価精度を取得します。一般化ギャップを測定することにより、量子化の正則化効果を示します。使用した量子化方法は、実用性のために最適化されています。チューニングはほとんど必要なく、ハードウェア機能を念頭に置いて設計されています。私たちの仕事は、量子化に最適な数値形式のさらなる研究と、これらの形式をサポートする機械学習アクセラレータの開発を動機付けています。この作業の一環として、https：//github.com/google-research/google-research/tree/master/aqtでオープンソース化されているJAXで記述された量子化ライブラリを提供しています。

Quantization has become a popular technique to compress neural networks and reduce compute cost, but most prior work focuses on studying quantization without changing the network size. Many real-world applications of neural networks have compute cost and memory budgets, which can be traded off with model quality by changing the number of parameters. In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves. Our results suggest that for each bfloat16 ResNet model, there are quantized models with lower cost and higher accuracy; in other words, the bfloat16 compute cost-quality tradeoff curve is Pareto-dominated by the 4-bit and 8-bit curves, with models primarily quantized to 4-bit yielding the best Pareto curve. Furthermore, we achieve state-of-the-art results on ImageNet for 4-bit ResNet-50 with quantization-aware training, obtaining a top-1 eval accuracy of 77.09%. We demonstrate the regularizing effect of quantization by measuring the generalization gap. The quantization method we used is optimized for practicality: It requires little tuning and is designed with hardware capabilities in mind. Our work motivates further research into optimal numeric formats for quantization, as well as the development of machine learning accelerators supporting these formats. As part of this work, we contribute a quantization library written in JAX, which is open-sourced at https://github.com/google-research/google-research/tree/master/aqt.

updated: Fri May 07 2021 23:28:37 GMT+0000 (UTC)

published: Fri May 07 2021 23:28:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト