Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction

Jemin Lee; Yongin Kwon; Jeman Park; Misun Yu; Sihyeong Park; Hwanjun Song

Q-HyViT: ブリッジブロック再構成を使用したハイブリッドビジョントランスフォーマーのトレーニング後の量子化

最近、分類、検出、セグメンテーションなどの多くのアプリケーションにおいて、ビジョントランスフォーマー (ViT) が畳み込みニューラルネットワークに取って代わりました。しかし、ViT には高い計算要件があるため、広範な実装が妨げられています。この問題に対処するために、研究者らは、畳み込み層と変換層を線形複雑さの最適化されたアテンション計算と組み合わせた効率的なハイブリッド変換アーキテクチャを提案しました。さらに、計算需要を軽減する手段として、トレーニング後の量子化が提案されています。モバイルデバイスの場合、ViT の最適な加速を実現するには、量子化技術と効率的なハイブリッドトランス構造の戦略的統合が必要です。しかし、量子化を効率的なハイブリッドトランスに適用したこれまでの研究はありません。この論文では、ViT の既存の PTQ 手法を効率的なハイブリッド変圧器に適用すると、次の 4 つの課題が原因で精度が大幅に低下することを発見しました: (i) 高いダイナミックレンジ、(ii) ゼロ点オーバーフロー、(iii) 多様性正規化、および (iv) 限定されたモデルパラメータ (<5M)。これらの課題を克服するために、私たちは、効率的なハイブリッド ViT (MobileViTv1、MobileViTv2、Mobile-Former、EfficientFormerV1、EfficientFormerV2) を大幅なマージンで量子化する最初の新しいトレーニング後の量子化手法を提案します (8 つのテストで平均 8.32% の改善) -ビット、6 ビットの場合は 26.02%) を既存の PTQ メソッド (EasyQuant、FQ-ViT、および PTQ4ViT) と比較しました。コードは https://github.com/Q-HyViT でリリースする予定です。

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing PTQ methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters (<5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2) with a significant margin (an average improvement of 8.32% for 8-bit and 26.02% for 6-bit) compared to existing PTQ methods (EasyQuant, FQ-ViT, and PTQ4ViT). We plan to release our code at https://github.com/Q-HyViT.

updated: Sat Aug 19 2023 05:47:45 GMT+0000 (UTC)

published: Wed Mar 22 2023 13:41:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト