FQ-ViT: Fully Quantized Vision Transformer without Retraining

Yang Lin; Tianyu Zhang; Peiqin Sun; Zheng Li; Shuchang Zhou

FQ-ViT：再トレーニングなしの完全に量子化されたビジョントランスフォーマー

ネットワーク量子化は、モデル推論の複雑さを大幅に軽減し、実際の展開で広く使用されています。ただし、ほとんどの既存の量子化手法は、主に畳み込みニューラルネットワーク（CNN）で開発およびテストされており、Transformerベースのアーキテクチャに適用すると深刻な劣化が発生します。この作業では、量子化トランスフォーマーのパフォーマンスの低下と推論の複雑さを軽減するための体系的な方法を紹介します。特に、ハードウェアに優しい方法でLayerNorm入力の深刻なチャネル間変動に対処するために、Powers-of-Two Scale（PTS）を提案します。さらに、4ビット量子化とBitShift演算子を使用して推論を簡素化しながら、アテンションマップの極端な不均一分布を維持できるLog-Int-Softmax（LIS）を提案します。さまざまなTransformerベースのアーキテクチャとベンチマークに関する包括的な実験により、アテンションマップでさらに低いビット幅を使用しながら、私たちの方法が以前の作業よりもパフォーマンスが優れていることが示されています。たとえば、ImageNetのViT-Lで85.17％のトップ1精度に到達し、COCOのカスケードマスクR-CNN（Swin-S）で51.4mAPに到達します。私たちの知る限り、完全に量子化されたビジョントランスフォーマーで同等の精度低下（約1％）を達成したのは私たちが初めてです。コードはhttps://github.com/linyang-zhh/FQ-ViTで入手できます。

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed and tested mainly on Convolutional Neural Networks (CNN), and suffer severe degradation when applied to Transformer-based architectures. In this work, we present a systematic method to reduce the performance degradation and inference complexity of Quantized Transformers. In particular, we propose Powers-of-Two Scale (PTS) to deal with the serious inter-channel variation of LayerNorm inputs in a hardware-friendly way. In addition, we propose Log-Int-Softmax (LIS) that can sustain the extreme non-uniform distribution of the attention maps while simplifying inference by using 4-bit quantization and the BitShift operator. Comprehensive experiments on various Transformer-based architectures and benchmarks show that our methods outperform previous works in performance while using even lower bit-width in attention maps. For instance, we reach 85.17% Top-1 accuracy with ViT-L on ImageNet and 51.4 mAP with Cascade Mask R-CNN (Swin-S) on COCO. To our knowledge, we are the first to achieve comparable accuracy degradation (~1%) on fully quantized Vision Transformers. Code is available at https://github.com/linyang-zhh/FQ-ViT.

updated: Sat Nov 27 2021 06:20:53 GMT+0000 (UTC)

published: Sat Nov 27 2021 06:20:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト