SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

Li Lyna Zhang; Xudong Wang; Jiahang Xu; Quanlu Zhang; Yujing Wang; Yuqing Yang; Ningxin Zheng; Ting Cao; Mao Yang

SpaceEvo: 効率的な INT8 推論のためのハードウェアフレンドリーな検索空間設計

Neural Architecture Search (NAS) と量子化の組み合わせは、低 FLOP INT8 量子化ニューラルネットワーク (QNN) の自動設計に成功していることが証明されています。ただし、NAS を直接適用して、実際のデバイスで低レイテンシを実現する正確な QNN モデルを設計すると、パフォーマンスが低下します。この作業では、INT8 レイテンシが低いのは、量子化に適していない問題によるものであることがわかりました。従来技術の検索空間での演算子と構成 (チャネル幅など) の選択は、さまざまな量子化効率につながり、INT8 推論速度を遅くする可能性があります。この課題に対処するために、ターゲットハードウェアごとに量子化に適した専用の検索空間を自動的に設計する方法である SpaceEvo を提案します。 SpaceEvo の重要なアイデアは、ハードウェア優先の演算子と構成を自動的に検索して検索空間を構築し、QT スコアと呼ばれるメトリックに導かれて、候補検索空間がどれだけ量子化に適しているかを定量化することです。さらに、発見した検索空間で量子化されたすべてのスーパーネットをトレーニングし、検索されたモデルを追加の再トレーニングや量子化なしで直接展開できるようにします。私たちが発見したモデルは、さまざまなレイテンシの制約の下で新しい SOTA INT8 量子化精度を確立し、同じレイテンシの下で従来技術の CNN よりも ImageNet で最大 10.1% の精度向上を達成します。さまざまなエッジデバイスでの広範な実験により、SpaceEvo は、同じ精度を達成しながら最大 2.5 倍の速度で、手動で設計された既存の検索スペースよりも一貫して優れていることが実証されています。

The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN). However, directly applying NAS to design accurate QNN models that achieve low latency on real-world devices leads to inferior performance. In this work, we find that the poor INT8 latency is due to the quantization-unfriendly issue: the operator and configuration (e.g., channel width) choices in prior art search spaces lead to diverse quantization efficiency and can slow down the INT8 inference speed. To address this challenge, we propose SpaceEvo, an automatic method for designing a dedicated, quantization-friendly search space for each target hardware. The key idea of SpaceEvo is to automatically search hardware-preferred operators and configurations to construct the search space, guided by a metric called Q-T score to quantify how quantization-friendly a candidate search space is. We further train a quantized-for-all supernet over our discovered search space, enabling the searched models to be directly deployed without extra retraining or quantization. Our discovered models establish new SOTA INT8 quantized accuracy under various latency constraints, achieving up to 10.1% accuracy improvement on ImageNet than prior art CNNs under the same latency. Extensive experiments on diverse edge devices demonstrate that SpaceEvo consistently outperforms existing manually-designed search spaces with up to 2.5x faster speed while achieving the same accuracy.

updated: Wed Mar 15 2023 01:41:21 GMT+0000 (UTC)

published: Wed Mar 15 2023 01:41:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト