VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer

Mengshu Sun; Haoyu Ma; Guoliang Kang; Yifan Jiang; Tianlong Chen; Xiaolong Ma; Zhangyang Wang; Yanzhi Wang

VAQF：低ビットビジョントランスフォーマー用の全自動ソフトウェア-ハードウェア共同設計フレームワーク

注意メカニズムを備えたトランスフォーマーアーキテクチャは、自然言語処理（NLP）で成功を収めており、ビジョントランスフォーマー（ViT）は、最近、アプリケーションドメインをさまざまなビジョンタスクに拡張しました。 ViTは高いパフォーマンスを実現しますが、モデルサイズが大きく、計算が複雑であるため、エッジデバイスへの展開が妨げられます。ハードウェアで高スループットを実現し、同時にモデルの精度を維持するために、バイナリウェイトと低精度アクティベーションを備えた量子化ViT用のFPGAプラットフォーム上に推論アクセラレータを構築するフレームワークであるVAQFを提案します。モデル構造と必要なフレームレートが与えられると、VAQFは、アクティブ化に必要な量子化精度と、ハードウェア要件を満たすアクセラレータの最適化されたパラメータ設定を自動的に出力します。実装は、ザイリンクスZCU102FPGAボード上のVivadoHigh-Level Synthesis（HLS）を使用して開発され、DeiTベースモデルを使用した評価結果は、24フレーム/秒（FPS）のフレームレート要件が8ビットで満たされることを示しています。アクティベーションクオンタイズ、および30FPSのターゲットは6ビットのアクティベーションクオンタイズで満たされます。私たちの知る限り、量子化がFPGAのViTアクセラレーションに組み込まれたのはこれが初めてであり、完全自動フレームワークの助けを借りて、ターゲットフレームを指定してソフトウェア側の量子化戦略とハードウェア側のアクセラレータ実装をガイドします。割合。量子化トレーニングと比較して、コンパイル時間のコストは非常に小さく、生成されたアクセラレータは、FPGA上の最先端のViTモデルのリアルタイム実行を実現する機能を示しています。

The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.

updated: Mon Jan 17 2022 20:27:52 GMT+0000 (UTC)

published: Mon Jan 17 2022 20:27:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト