oViT: An Accurate Second-Order Pruning Framework for Vision Transformers

Denis Kuznedelev; Eldar Kurtic; Elias Frantar; Dan Alistarh

oViT: ビジョントランスフォーマー向けの正確な二次枝刈りフレームワーク

Vision Transformer (ViT) ファミリーのモデルは、最近、ImageNet などの画像分類タスク全体で画期的な結果をもたらしました。しかし、それらは依然として展開の障壁に直面しています。特に、プルーニングなどの圧縮技術によって精度が深刻な影響を受ける可能性があるという事実があります。このホワイトペーパーでは、Optimal ViT Surgeon (oViT) を導入することで、この問題に対処するための一歩を踏み出しました。これは、ビジョントランスフォーマー (ViT) モデルの重みをスパース化するための新しい最先端の方法です。技術レベルでは、oViT は 2 次情報を活用する新しい重みプルーニングアルゴリズムを導入し、特に ViT のコンテキストで高精度かつ効率的になるように適合されています。この正確なワンショットプルーナーを、ViT の段階的なプルーニング、増強、および回復スケジュールの詳細な調査で補完します。これは、ViT 圧縮を成功させるために重要であることが示されています。 XCiT、EfficientFormer、Swin などの新しいバリアントだけでなく、従来の ViT および DeiT モデルでの広範な実験を通じて、この方法を検証します。さらに、私たちの結果は、最近提案された高精度の ResNets にも関連しています。私たちの結果は、ViT ファミリーモデルが実際に高いスパースレベル (例: 75% 以上) まで正確性への影響が少なく (相対的な低下が 1% 以下) 削減できること、および私たちのアプローチが以前の方法よりも大幅に優れていることを初めて示しています。スパース性が高い。さらに、私たちの方法が構造化プルーニング方法および量子化と互換性があり、スパース性を認識する推論エンジンの大幅な高速化につながる可能性があることを示します。

Models from the Vision Transformer (ViT) family have recently provided breakthrough results across image classification tasks such as ImageNet. Yet, they still face barriers to deployment, notably the fact that their accuracy can be severely impacted by compression techniques such as pruning. In this paper, we take a step towards addressing this issue by introducing Optimal ViT Surgeon (oViT), a new state-of-the-art method for the weight sparsification of Vision Transformers (ViT) models. At the technical level, oViT introduces a new weight pruning algorithm which leverages second-order information, specifically adapted to be both highly-accurate and efficient in the context of ViTs. We complement this accurate one-shot pruner with an in-depth investigation of gradual pruning, augmentation, and recovery schedules for ViTs, which we show to be critical for successful ViT compression. We validate our method via extensive experiments on classical ViT and DeiT models, as well as on newer variants, such as XCiT, EfficientFormer and Swin. Moreover, our results are even relevant to recently-proposed highly-accurate ResNets. Our results show for the first time that ViT-family models can in fact be pruned to high sparsity levels (e.g. ≥75%) with low impact on accuracy (≤1% relative drop), and that our approach outperforms prior methods by significant margins at high sparsities. In addition, we show that our method is compatible with structured pruning methods and quantization, and that it can lead to significant speedups on a sparsity-aware inference engine.

updated: Fri Oct 14 2022 12:19:09 GMT+0000 (UTC)

published: Fri Oct 14 2022 12:19:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト