Row-wise Accelerator for Vision Transformer

Hong-Yi Wang; Tian-Sheuan Chang

VisionTransformerの行方向のアクセラレータ

自然言語処理の成功を受けて、ビジョンアプリケーション用のトランスフォーマーは、その優れたパフォーマンスにより、近年大きな注目を集めています。ただし、視覚用の既存の深層学習ハードウェアアクセラレータは、モデルアーキテクチャに大きな違いがあるため、この構造を効率的に実行できません。その結果、この論文は、行ごとのスケジューリングを備えたビジョントランスのハードウェアアクセラレータを提案します。これは、ビジョントランスの主要な操作を単一のドット積プリミティブとして分解し、統一された効率的な実行を実現します。さらに、列の重みを共有することで、データを再利用し、メモリの使用量を減らすことができます。 TSMC 40nm CMOSテクノロジを使用した実装では、600MHzのクロック周波数で403.2 GOPSのスループットを実現するために、262Kのゲート数と149KBのSRAMバッファのみが必要です。

Following the success of the natural language processing, the transformer for vision applications has attracted significant attention in recent years due to its excellent performance. However, existing deep learning hardware accelerators for vision cannot execute this structure efficiently due to significant model architecture differences. As a result, this paper proposes the hardware accelerator for vision transformers with row-wise scheduling, which decomposes major operations in vision transformers as a single dot product primitive for a unified and efficient execution. Furthermore, by sharing weights in columns, we can reuse the data and reduce the usage of memory. The implementation with TSMC 40nm CMOS technology only requires 262K gate count and 149KB SRAM buffer for 403.2 GOPS throughput at 600MHz clock frequency.

updated: Mon May 09 2022 01:47:44 GMT+0000 (UTC)

published: Mon May 09 2022 01:47:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト