Understanding Self-attention Mechanism via Dynamical System Perspective

Zhongzhan Huang; Mingfu Liang; Jinghui Qin; Shanshan Zhong; Liang Lin

動的システムの観点から自己注意のメカニズムを理解する

セルフアテンションメカニズム (SAM) は、人工知能のさまざまな分野で広く使用されており、さまざまなモデルのパフォーマンスを向上させることに成功しています。ただし、このメカニズムの現在の説明は主に直感と経験に基づいており、SAM がパフォーマンスにどのように役立つかについての直接的なモデル化はまだ不足しています。この問題を軽減するために、この論文では、残差ニューラルネットワークの動的システムの観点に基づいて、常微分方程式 (ODE) の高精度解における固有剛性現象 (SP) が高精度の解にも広く存在することを最初に示します。 -パフォーマンスニューラルネットワーク (NN)。したがって、特徴レベルで SP を測定する NN の能力は、高いパフォーマンスを得るために必要であり、NN のトレーニングの難しさの重要な要素です。硬い ODE を解くのに有効な適応ステップサイズ法と同様に、SAM は剛性情報の推定を改良し、適応的注意値。これは、SAM がモデルのパフォーマンスにメリットをもたらす理由と方法について新たな理解を提供します。この斬新な視点は、SAM における宝くじ仮説を説明し、表現能力の新しい定量的指標を設計し、理論にインスピレーションを得た新しいアプローチである StepNet を刺激することもできます。いくつかの人気のあるベンチマークに関する広範な実験により、StepNet がきめ細かい剛性情報を抽出し、SP を正確に測定できることが実証され、さまざまな視覚的タスクの大幅な改善につながります。

The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models. However, current explanations of this mechanism are mainly based on intuitions and experiences, while there still lacks direct modeling for how the SAM helps performance. To mitigate this issue, in this paper, based on the dynamical system perspective of the residual neural network, we first show that the intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN). Thus the ability of NN to measure SP at the feature level is necessary to obtain high performance and is an important factor in the difficulty of training NN. Similar to the adaptive step-size method which is effective in solving stiff ODEs, we show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP by refining the estimation of stiffness information and generating adaptive attention values, which provides a new understanding about why and how the SAM can benefit the model performance. This novel perspective can also explain the lottery ticket hypothesis in SAM, design new quantitative metrics of representational ability, and inspire a new theoretic-inspired approach, StepNet. Extensive experiments on several popular benchmarks demonstrate that StepNet can extract fine-grained stiffness information and measure SP accurately, leading to significant improvements in various visual tasks.

updated: Sat Aug 19 2023 08:17:41 GMT+0000 (UTC)

published: Sat Aug 19 2023 08:17:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト