ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers

Yutong Xie; Jianpeng Zhang; Yong Xia; Anton van den Hengel; Qi Wu

ClusTR: ビジョントランスフォーマーのクラスタリングによる効率的な自己注意の探索

トランスフォーマーは、言語モデリングの起源から画像ベースのアプリケーションへの移行に成功しましたが、二次計算の複雑さは、特に密な予測では依然として課題です。この論文では、長期依存関係をモデル化する能力を維持しながら計算の複雑さを軽減することを目的として、高密度自己注意の代替として、コンテンツベースの疎注意方法を提案します。具体的には、総トークン数を減らすコンテンツベースの方法として、キーと値のトークンをクラスタ化してから集約します。結果として得られるクラスター化されたトークンシーケンスは、元の信号の意味の多様性を保持しますが、より低い計算コストで処理できます。その上、クラスタリングに基づく注意をシングルスケールからマルチスケールにさらに拡張します。これは、密な予測タスクを助長します。提案された Transformer アーキテクチャに ClusTR というラベルを付け、さまざまなビジョンタスクで最先端のパフォーマンスを達成しながら、より低い計算コストとより少ないパラメーターで実現することを実証します。たとえば、2270 万のパラメーターを持つ ClusTR 小規模モデルは、ImageNet で 83.2% のトップ 1 精度を達成します。ソースコードと ImageNet モデルは公開されます。

Although Transformers have successfully transitioned from their language modelling origins to image-based applications, their quadratic computational complexity remains a challenge, particularly for dense prediction. In this paper we propose a content-based sparse attention method, as an alternative to dense self-attention, aiming to reduce the computation complexity while retaining the ability to model long-range dependencies. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost. Besides, we further extend the clustering-guided attention from single-scale to multi-scale, which is conducive to dense prediction tasks. We label the proposed Transformer architecture ClusTR, and demonstrate that it achieves state-of-the-art performance on various vision tasks but at lower computational cost and with fewer parameters. For instance, our ClusTR small model with 22.7M parameters achieves 83.2% Top-1 accuracy on ImageNet. Source code and ImageNet models will be made publicly available.

updated: Sun Aug 28 2022 04:18:27 GMT+0000 (UTC)

published: Sun Aug 28 2022 04:18:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト