MetaGrad: Adaptive Gradient Quantization with Hypernetworks

Kaixin Xu; Alina Hui Xiu Lee; Ziyuan Zhao; Zhe Wang; Min Wu; Weisi Lin

MetaGrad: ハイパーネットワークによる適応勾配量子化

ネットワーク圧縮アプローチの一般的なトラックは、ニューラルネットワークのトレーニングと推論中にフォワードパスを高速化する量子化対応トレーニング (QAT) です。ただし、トレーニング時間の約半分に寄与しているにもかかわらず、トレーニング中に後方パスを量子化して加速するためのこれまでの取り組みはあまり行われていません。これは、QAT 設定のように、バックワード中の低精度勾配のエラーをトレーニング目標によって償却できないという事実に部分的に起因する可能性があります。この作業では、ハイパーネットワークを介して次のトレーニング反復の計算グラフに勾配を組み込むことにより、この問題を解決することを提案します。異なる CNN ネットワークアーキテクチャを使用した CIFAR-10 データセットでのさまざまな実験は、ハイパーネットワークベースのアプローチが勾配量子化ノイズの悪影響を効果的に低減し、CIFAR-10 の VGG-16 でわずか 0.64 の精度低下で勾配を INT4 に量子化できることを示しています。

A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and accelerate the backward pass during training, even though that contributes around half of the training time. This can be partly attributed to the fact that errors of low-precision gradients during backward cannot be amortized by the training objective as in the QAT setting. In this work, we propose to solve this problem by incorporating the gradients into the computation graph of the next training iteration via a hypernetwork. Various experiments on CIFAR-10 dataset with different CNN network architectures demonstrate that our hypernetwork-based approach can effectively reduce the negative effect of gradient quantization noise and successfully quantizes the gradients to INT4 with only 0.64 accuracy drop for VGG-16 on CIFAR-10.

updated: Sat Mar 04 2023 07:26:34 GMT+0000 (UTC)

published: Sat Mar 04 2023 07:26:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト