Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction

Jemin Lee; Yongin Kwon; Jeman Park; Misun Yu; Hwanjun Song

Q-HyViT: ブリッジブロック再構築を使用したハイブリッドビジョントランスフォーマーのトレーニング後の量子化

最近、ビジョントランスフォーマー (ViT) が、分類、検出、セグメンテーションなどの多くのタスクで畳み込みニューラルネットワークモデルに取って代わりました。ただし、ViT の高い計算要件は、その広範な実装を妨げています。この問題に対処するために、研究者は、畳み込み層と変換層を組み合わせ、線形の複雑さのために注意計算を最適化する効率的なハイブリッド変換アーキテクチャを提案しました。さらに、計算負荷を軽減する手段として、トレーニング後の量子化が提案されています。量子化技術と効率的なハイブリッドトランス構造を組み合わせることは、モバイルデバイスのビジョントランスフォーマーの加速を最大化するために重要です。ただし、効率的なハイブリッドトランスに量子化を適用した先行研究はありません。このホワイトペーパーでは、最初に、ViT の既存の PTQ 法を効率的なハイブリッドトランスに適用する単純な方法では、次の課題により精度が大幅に低下することを発見しました。(i) 高度なダイナミックレンジ、(ii) ゼロ点オーバーフロー、(iii) 多様な正規化、および (iv) 制限されたモデルパラメーター (<5M)。これらの課題を克服するために、既存の PTQ メソッド (EasyQuant、EasyQuant、 FQ-ViT、および PTQ4ViT)。 https://github.com/Q-HyViT でコードをリリースする予定です。

Recently, vision transformers (ViT) have replaced convolutional neural network models in numerous tasks, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers and optimize attention computation for linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. Combining quantization techniques and efficient hybrid transformer structures is crucial to maximize the acceleration of vision transformers on mobile devices. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, at first, we discover that the straightforward manner to apply the existing PTQ methods for ViT to efficient hybrid transformers results in a drastic accuracy drop due to the following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters (<5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid vision transformers (MobileViTv1 and MobileViTv2) with a significant margin (an average improvement of 7.75%) compared to existing PTQ methods (EasyQuant, FQ-ViT, and PTQ4ViT). We plan to release our code at https://github.com/Q-HyViT.

updated: Wed Mar 22 2023 13:41:22 GMT+0000 (UTC)

published: Wed Mar 22 2023 13:41:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト