FrostNet: Towards Quantization-Aware Network Architecture Search

Taehoon Kim; YoungJoon Yoo; Jihoon Yang

FrostNet：量子化対応ネットワークアーキテクチャ検索に向けて

INT8量子化は、メモリと計算リソースの使用量を削減するために、エッジデバイスに畳み込みニューラルネットワーク（CNN）を展開するための標準的な手法の1つになっています。既存のモバイルターゲットネットワークアーキテクチャの量子化されたパフォーマンスを分析することにより、最適なINT8量子化のためのネットワークアーキテクチャの重要性に関する問題を提起することができます。このホワイトペーパーでは、単精度（FLOAT32）と量子化（INT8）の両方のパフォーマンスを保証するネットワークを見つけるための新しいネットワークアーキテクチャ検索（NAS）手順を紹介します。最初に、量子化対応トレーニング（QAT）を可能にする重要で簡単な最適化手法を提案します。浮動小数点統計支援（StatAssist）と確率的勾配ブースティング（GradBoost）です。グラジエントベースのNASをStatAssistおよびGradBoostと統合することにより、量子化効率の高いネットワークビルディングブロックであるFrostボトルネックを発見しました。さらに、ハードウェア対応NASのビルディングブロックとしてFrostボトルネックを使用して、競争力のあるFLOAT32パフォーマンスを維持しながら、他のモバイルターゲットネットワークと比較して改善された量子化パフォーマンスを示す量子化効率の高いネットワークFrostNetsを取得しました。当社のFrostNetは、レイテンシーの削減率が高い（平均65％）ため、量子化時に同等のレイテンシーを持つ既存のCNNよりも高い認識精度を実現します。

INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%).

updated: Mon Nov 30 2020 10:09:33 GMT+0000 (UTC)

published: Wed Jun 17 2020 06:40:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト