PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems

Qing Jin; Zhiyu Chen; Jian Ren; Yanyu Li; Yanzhi Wang; Kaiyuan Yang

PIM-QAT: Processing-In-Memory (PIM) システムのニューラルネットワーク量子化

プロセッシングインメモリ (PIM) は、ますます研究が進んでいるニューロモルフィックハードウェアであり、ディープラーニング推論のエネルギーとスループットの桁違いの改善を約束します。 PIM は、メモリ内の超並列で効率的なアナログコンピューティングを活用して、従来のデジタルハードウェアのデータ移動のボトルネックを回避します。ただし、追加の量子化ステップ (つまり、PIM 量子化) は、通常、ハードウェアの制約により解像度が制限され、アナログ計算結果をデジタルドメインに変換するために必要です。一方、アナログとデジタルのインターフェイスが不完全であるため、PIM量子化には非理想的な効果が広く存在し、推論の精度がさらに低下します。この論文では、すべてのPIMシステムに遍在するPIM量子化を組み込むために、量子化されたネットワークをトレーニングする方法を提案します。具体的には、PIM 量子化認識トレーニング (PIM-QAT) アルゴリズムを提案し、トレーニングダイナミクスを分析してトレーニングの収束を促進することにより、後方および前方伝播中に再スケーリング手法を導入します。また、実際の PIM チップに含まれる非理想的な線形性と確率的熱ノイズの悪影響を抑制するために、バッチ正規化 (BN) キャリブレーションと調整された精度トレーニングという 2 つの手法を提案します。私たちの方法は、3 つの主流の PIM 分解スキームと、プロトタイプチップ上で物理的に検証されています。この余分な量子化ステップを考慮せずに失敗する、従来の方法でトレーニングされた量子化モデルを PIM システムに直接展開する場合と比較すると、この方法は大幅に改善されます。また、最も一般的なネットワークトポロジのさまざまなネットワーク深度を使用して、CIFAR10 および CIFAR100 データセット全体で、デジタルハードウェア上の従来の量子化モデルと同等の推論精度を PIM システムで達成します。

Processing-in-memory (PIM), an increasingly studied neuromorphic hardware, promises orders of energy and throughput improvements for deep learning inference. Leveraging the massively parallel and efficient analog computing inside memories, PIM circumvents the bottlenecks of data movements in conventional digital hardware. However, an extra quantization step (i.e. PIM quantization), typically with limited resolution due to hardware constraints, is required to convert the analog computing results into digital domain. Meanwhile, non-ideal effects extensively exist in PIM quantization because of the imperfect analog-to-digital interface, which further compromises the inference accuracy. In this paper, we propose a method for training quantized networks to incorporate PIM quantization, which is ubiquitous to all PIM systems. Specifically, we propose a PIM quantization aware training (PIM-QAT) algorithm, and introduce rescaling techniques during backward and forward propagation by analyzing the training dynamics to facilitate training convergence. We also propose two techniques, namely batch normalization (BN) calibration and adjusted precision training, to suppress the adverse effects of non-ideal linearity and stochastic thermal noise involved in real PIM chips. Our method is validated on three mainstream PIM decomposition schemes, and physically on a prototype chip. Comparing with directly deploying conventionally trained quantized model on PIM systems, which does not take into account this extra quantization step and thus fails, our method provides significant improvement. It also achieves comparable inference accuracy on PIM systems as that of conventionally quantized models on digital hardware, across CIFAR10 and CIFAR100 datasets using various network depths for the most popular network topology.

updated: Sun Sep 18 2022 17:51:55 GMT+0000 (UTC)

published: Sun Sep 18 2022 17:51:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト