FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Yang Lin; Tianyu Zhang; Peiqin Sun; Zheng Li; Shuchang Zhou

FQ-ViT：完全に量子化されたビジョントランスフォーマーのトレーニング後の量子化

ネットワーク量子化は、モデル推論の複雑さを大幅に軽減し、実際の展開で広く使用されています。ただし、ほとんどの既存の量子化手法は、主に畳み込みニューラルネットワーク（CNN）で開発されており、完全に量子化されたビジョントランスに適用すると深刻な劣化を被ります。この作業では、LayerNorm入力のチャネル間での深刻な変動が原因でこれらの問題の多くが発生することを示し、完全に量子化されたパフォーマンスの低下と推論の複雑さを軽減する体系的な方法であるPower-of-Two Factor（PTF）を示します。ビジョントランス。さらに、アテンションマップの極端な不均一分布を観察し、それを維持し、4ビット量子化とBitShift演算子を使用して推論を簡素化するLog-Int-Softmax（LIS）を提案します。さまざまなトランスベースのアーキテクチャとベンチマークに関する包括的な実験により、完全に量子化されたビジョントランスフォーマー（FQ-ViT）は、アテンションマップでより低いビット幅を使用しながらも、以前の作業よりも優れていることが示されています。たとえば、ImageNetのViT-Lで84.89％のトップ1精度に到達し、COCOのカスケードマスクR-CNN（Swin-S）で50.8mAPに到達します。私たちの知る限り、完全に量子化されたビジョントランスでロスレス精度の低下（約1％）を達成したのは私たちが初めてです。コードはhttps://github.com/megvii-research/FQ-ViTで入手できます。

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed mainly on Convolutional Neural Networks (CNNs), and suffer severe degradation when applied to fully quantized vision transformers. In this work, we demonstrate that many of these difficulties arise because of serious inter-channel variation in LayerNorm inputs, and present, Power-of-Two Factor (PTF), a systematic method to reduce the performance degradation and inference complexity of fully quantized vision transformers. In addition, observing an extreme non-uniform distribution in attention maps, we propose Log-Int-Softmax (LIS) to sustain that and simplify inference by using 4-bit quantization and the BitShift operator. Comprehensive experiments on various transformer-based architectures and benchmarks show that our Fully Quantized Vision Transformer (FQ-ViT) outperforms previous works while even using lower bit-width on attention maps. For instance, we reach 84.89% top-1 accuracy with ViT-L on ImageNet and 50.8 mAP with Cascade Mask R-CNN (Swin-S) on COCO. To our knowledge, we are the first to achieve lossless accuracy degradation (~1%) on fully quantized vision transformers. The code is available at https://github.com/megvii-research/FQ-ViT.

updated: Wed May 18 2022 05:49:22 GMT+0000 (UTC)

published: Sat Nov 27 2021 06:20:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト