Post-Training Quantization for Vision Transformer

Zhenhua Liu; Yunhe Wang; Kai Han; Siwei Ma; Wen Gao

VisionTransformerのトレーニング後の量子化

最近、トランスフォーマーはさまざまなコンピュータービジョンアプリケーションで驚くべきパフォーマンスを達成しています。主流の畳み込みニューラルネットワークと比較して、ビジョントランスフォーマーは、モバイルデバイスで開発するのがより難しい強力な特徴表現を抽出するための洗練されたアーキテクチャであることがよくあります。この論文では、ビジョントランスフォーマーのメモリストレージと計算コストを削減するための効果的なトレーニング後の量子化アルゴリズムを紹介します。基本的に、量子化タスクは、重みと入力のそれぞれに最適な低ビット量子化間隔を見つけることと見なすことができます。注意メカニズムの機能を維持するために、量子化後の自己注意結果の相対的な順序を維持することを目的とした従来の量子化目標にランキング損失を導入します。さらに、異なる層の量子化損失と特徴の多様性との関係を徹底的に分析し、各アテンションマップと出力特徴の核ノルムを活用することにより、混合精度の量子化スキームを探索します。提案された方法の有効性は、いくつかのベンチマークモデルとデータセットで検証されており、最先端のトレーニング後の量子化アルゴリズムよりも優れています。たとえば、約8ビットの量子化を使用したImageNetデータセットでDeiT-Bモデルを使用すると、81.29％のトップ1精度を得ることができます。

Recently, transformer has achieved remarkable performance on a variety of computer vision applications. Compared with mainstream convolutional neural networks, vision transformers are often of sophisticated architectures for extracting powerful feature representations, which are more difficult to be developed on mobile devices. In this paper, we present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers. Basically, the quantization task can be regarded as finding the optimal low-bit quantization intervals for weights and inputs, respectively. To preserve the functionality of the attention mechanism, we introduce a ranking loss into the conventional quantization objective that aims to keep the relative order of the self-attention results after quantization. Moreover, we thoroughly analyze the relationship between quantization loss of different layers and the feature diversity, and explore a mixed-precision quantization scheme by exploiting the nuclear norm of each attention map and output feature. The effectiveness of the proposed method is verified on several benchmark models and datasets, which outperforms the state-of-the-art post-training quantization algorithms. For instance, we can obtain an 81.29% top-1 accuracy using DeiT-B model on ImageNet dataset with about 8-bit quantization.

updated: Sun Jun 27 2021 06:27:22 GMT+0000 (UTC)

published: Sun Jun 27 2021 06:27:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト