COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Jinqi Xiao; Miao Yin; Yu Gong; Xiao Zang; Jian Ren; Bo Yuan

COMCAT: 注意ベースの視覚モデルの効率的な圧縮とカスタマイズに向けて

Vision Transformer (ViT) やそのバリアントなどのアテンションベースのビジョンモデルは、さまざまなコンピュータービジョンタスクで有望なパフォーマンスを示しています。ただし、これらの新しいアーキテクチャはモデルサイズが大きく、計算コストが高いという問題があり、効率的なモデル圧縮ソリューションが必要です。現在まで、ViT の枝刈りはよく研究されていますが、CNN 圧縮に広く適用されている他の圧縮戦略 (モデル因数分解など) は、ViT 圧縮の文脈ではほとんど研究されていません。この論文では、コンパクトな注意ベースのビジョンモデルを取得するためのツールセットを充実させるために、ビジョントランスフォーマーを圧縮する効率的な方法を検討します。マルチヘッドアテンションレイヤーに関する新しい洞察に基づいて、最先端のプルーニング手法を上回る高効率の ViT 圧縮ソリューションを開発します。 ImageNet 上で DeiT-small および DeiT-base モデルを圧縮する場合、私たちが提案するアプローチは、パラメーターが少ない場合でも、0.45% および 0.76% 高いトップ 1 精度を達成できます。私たちの発見は、既存の研究よりもはるかに高速なトレーニング (最大 2.6 倍の高速化) と低い追加ストレージコスト (最大 1927.5 倍の削減) により、テキストから画像への拡散モデルのカスタマイズ効率を向上させるためにも適用できます。

Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational costs, calling for efficient model compression solutions. To date, pruning ViTs has been well studied, while other compression strategies that have been widely applied in CNN compression, e.g., model factorization, is little explored in the context of ViT compression. This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. Based on the new insight on the multi-head attention layer, we develop a highly efficient ViT compression solution, which outperforms the state-of-the-art pruning methods. For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to 2.6× speedup) and lower extra storage cost (up to 1927.5× reduction) than the existing works.

updated: Fri May 26 2023 19:50:00 GMT+0000 (UTC)

published: Fri May 26 2023 19:50:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト