Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations

Bohan Zhuang; Jing Liu; Mingkui Tan; Lingqiao Liu; Ian Reid; Chunhua Shen

低ビット幅の重みとアクティベーションを使用した畳み込みニューラルネットワークの効果的なトレーニング

この論文は、低ビット幅の重みとアクティベーションの両方の深い畳み込みニューラルネットワークをトレーニングする問題に取り組んでいます。量子化器は微分不可能であるため、低精度ネットワークの最適化は非常に困難であり、精度が大幅に低下する可能性があります。これに対処するために、次の3つの実用的なアプローチを提案します。（ii）確率的精度; （iii）ネットワークトレーニングを改善するための共同知識蒸留。まず、漸進的量子化のために、適切な極小値を漸進的に見つけるための2つのスキームを提案します。具体的には、最初に量子化された重みを使用してネットを最適化し、次にアクティベーションを量子化することを提案します。これは、それらを同時に最適化する従来の方法とは対照的です。さらに、トレーニング中にビット幅を高精度から低精度に徐々に減少させる2番目のプログレッシブ量子化スキームを提案します。次に、複数ラウンドのトレーニング段階による過度のトレーニング負担を軽減するために、他の部分を完全な精度に保ちながら、サブネットワークをランダムにサンプリングして量子化する1段階の確率的精度戦略をさらに提案します。最後に、新しい学習スキームを採用して、低精度モデルと一緒に全精度モデルを共同でトレーニングします。そうすることで、全精度モデルは、低精度モデルのトレーニングをガイドするためのヒントを提供し、低精度ネットワークのパフォーマンスを大幅に向上させます。さまざまなデータセット（CIFAR-100、ImageNetなど）での広範な実験により、提案された方法の有効性が示されています。

This paper tackles the problem of training a deep convolutional neural network of both low-bitwidth weights and activations. Optimizing a low-precision network is very challenging due to the non-differentiability of the quantizer, which may result in substantial accuracy loss. To address this, we propose three practical approaches, including (i) progressive quantization; (ii) stochastic precision; and (iii) joint knowledge distillation to improve the network training. First, for progressive quantization, we propose two schemes to progressively find good local minima. Specifically, we propose to first optimize a net with quantized weights and subsequently quantize activations. This is in contrast to the traditional methods which optimize them simultaneously. Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training. Second, to alleviate the excessive training burden due to the multi-round training stages, we further propose a one-stage stochastic precision strategy to randomly sample and quantize sub-networks while keeping other parts in full-precision. Finally, we adopt a novel learning scheme to jointly train a full-precision model alongside the low-precision one. By doing so, the full-precision model provides hints to guide the low-precision model training and significantly improves the performance of the low-precision network. Extensive experiments on various datasets (e.g., CIFAR-100, ImageNet) show the effectiveness of the proposed methods.

updated: Fri Jun 04 2021 00:26:53 GMT+0000 (UTC)

published: Sat Aug 10 2019 11:48:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト