A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Hongkang Li; Meng Wang; Sijia Liu; Pin-yu Chen

浅いビジョントランスフォーマーの理論的理解: 学習、一般化、およびサンプルの複雑さ

自己注意モジュールを備えたビジョントランスフォーマー (ViTs) は、最近、多くのビジョンタスクで経験的に大きな成功を収めています。ただし、レイヤー間の相互作用が非凸であるため、理論的な学習と一般化分析はほとんどとらえどころのないものです。ラベル関連トークンとラベル非関連トークンの両方を特徴付けるデータモデルに基づいて、このホワイトペーパーでは、浅い ViT、つまり、1 つの自己注意層とそれに続く 2 層のパーセプトロンを分類タスク用にトレーニングする最初の理論的分析を提供します。サンプルの複雑さを特徴付けて、一般化エラーをゼロにします。サンプルの複雑さの限界は、ラベル関連トークンの割合の逆数、トークンノイズレベル、および初期モデルエラーと正の相関があります。また、確率的勾配降下 (SGD) を使用したトレーニングプロセスがスパースなアテンションマップにつながることも証明します。これは、アテンションの成功に関する一般的な直感の正式な検証です。さらに、この論文では、適切なトークンのスパース化により、偽の相関関係を含む、ラベルに関係のない、および/またはノイズの多いトークンを削除することにより、テストのパフォーマンスを向上させることができることを示しています。合成データと CIFAR-10 データセットに関する実証実験は、理論的結果を正当化し、より深い ViT に一般化します。

Vision Transformers (ViTs) with self-attention modules have recently achieved great empirical success in many vision tasks. Due to non-convex interactions across layers, however, theoretical learning and generalization analysis is mostly elusive. Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. We characterize the sample complexity to achieve a zero generalization error. Our sample complexity bound is positively correlated with the inverse of the fraction of label-relevant tokens, the token noise level, and the initial model error. We also prove that a training process using stochastic gradient descent (SGD) leads to a sparse attention map, which is a formal verification of the general intuition about the success of attention. Moreover, this paper indicates that a proper token sparsification can improve the test performance by removing label-irrelevant and/or noisy tokens, including spurious correlations. Empirical experiments on synthetic data and CIFAR-10 dataset justify our theoretical results and generalize to deeper ViTs.

updated: Sun Nov 12 2023 04:36:45 GMT+0000 (UTC)

published: Sun Feb 12 2023 22:12:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト