Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems

Dan Navon; Alex M. Bronstein

トランス対。 MLP-Mixer: NLP 問題の指数関数的表現ギャップ

ビジョントランスフォーマーは、さまざまなビジョンタスクで広く使用されています。一方、MLP ベースのアーキテクチャを使用して同様のパフォーマンスを達成しようとする MLP ミキサーから始まる別の一連の作業があります。興味深いことに、これまでこれらの mlp ベースのアーキテクチャは NLP タスクに適応されていませんでした。さらに、これまで mlp ベースのアーキテクチャは、ビジョンタスクで最先端のパフォーマンスを達成できませんでした。この論文では、複数の異なる入力間の依存関係を同時にモデル化する際の mlp ベースのアーキテクチャの表現力を分析し、注意と mlp ベースのメカニズムの間の指数関数的なギャップを示します。私たちの結果は、NLPの問題でmlpが注意ベースのメカニズムと競合できないことの理論的説明を示唆しています。また、ビジョンタスクのパフォーマンスギャップは、複数の異なる場所間の依存関係をモデル化する際のmlpの相対的な弱さに起因する可能性があることも示唆しています。 mlp アーキテクチャを使用したスマートな入力順列だけでは、パフォーマンスのギャップを埋めるには不十分な場合があります。

Vision-Transformers are widely used in various vision tasks. Meanwhile, there is another line of works starting with the MLP-mixer trying to achieve similar performance using mlp-based architectures. Interestingly, until now those mlp-based architectures have not been adapted for NLP tasks. Additionally, until now, mlp-based architectures have failed to achieve state-of-the-art performance in vision tasks. In this paper, we analyze the expressive power of mlp-based architectures in modeling dependencies between multiple different inputs simultaneously, and show an exponential gap between the attention and the mlp-based mechanisms. Our results suggest a theoretical explanation for the mlp inability to compete with attention-based mechanisms in NLP problems, they also suggest that the performance gap in vision tasks may be due to the mlp relative weakness in modeling dependencies between multiple different locations, and that combining smart input permutations with mlp architectures may not be enough to close the performance gap alone.

updated: Sat Nov 12 2022 14:40:21 GMT+0000 (UTC)

published: Wed Aug 17 2022 09:59:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト