A Fast Training-Free Compression Framework for Vision Transformers

Jung Hwan Heo; Arash Fayyazi; Mahdi Nazemi; Massoud Pedram

ビジョントランスフォーマー向けのトレーニング不要の高速圧縮フレームワーク

トークンの刈り込みは、大規模な Transformer モデルの推論を高速化するための効果的なソリューションとして登場しました。ただし、ビジョントランスフォーマー (ViT) モデルの高速化に関する以前の作業では、最初からトレーニングするか、追加のパラメーターを使用して微調整する必要があり、単純なプラグアンドプレイが妨げられていました。展開段階での高いトレーニングコストを回避するために、(i) 初期層の密な特徴抽出器によって有効になる、高速なトレーニング不要の圧縮フレームワークを提示します。 (ii) より圧縮可能なシャープネス最小化モデル。（iii）さまざまなコンテキストで空間的な関係を利用できるローカルとグローバルのトークンの合併。フレームワークをさまざまな ViT および DeiT モデルに適用し、FLOPS を最大 2 倍削減し、推論スループットを 1.8 倍高速化し、1% 未満の精度損失を達成しました。また、既存のアプローチよりもトレーニング時間を 2 桁短縮しました。コードは https://github.com/johnheo/fast-compress-vit で入手できます

Token pruning has emerged as an effective solution to speed up the inference of large Transformer models. However, prior work on accelerating Vision Transformer (ViT) models requires training from scratch or fine-tuning with additional parameters, which prevents a simple plug-and-play. To avoid high training costs during the deployment stage, we present a fast training-free compression framework enabled by (i) a dense feature extractor in the initial layers; (ii) a sharpness-minimized model which is more compressible; and (iii) a local-global token merger that can exploit spatial relationships at various contexts. We applied our framework to various ViT and DeiT models and achieved up to 2x reduction in FLOPS and 1.8x speedup in inference throughput with <1% accuracy loss, while saving two orders of magnitude shorter training times than existing approaches. Code will be available at https://github.com/johnheo/fast-compress-vit

updated: Sat Mar 04 2023 05:34:25 GMT+0000 (UTC)

published: Sat Mar 04 2023 05:34:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト