iMixer: hierarchical Hopfield network implies an invertible, implicit and iterative MLP-Mixer

Toshihiro Ota; Masato Taki

iMixer: 階層的なホップフィールドネットワークは、可逆的で暗黙的で反復的な MLP-Mixer を意味します。

ここ数年、コンピュータービジョンにおける Transformer の成功により、MLP-Mixer など、Transformer と競合する多くの代替モデルが発見されました。弱い誘導バイアスにもかかわらず、これらのモデルは、よく研究された畳み込みニューラルネットワークに匹敵するパフォーマンスを達成しています。最新のホップフィールドネットワークに関する最近の研究は、特定のエネルギーベースの連想メモリモデルとトランスフォーマーまたは MLP ミキサーとの間の対応を示唆しており、トランスフォーマータイプのアーキテクチャ設計の理論的背景に光を当てています。この論文では、最近導入された階層型ホップフィールドネットワークへの対応を一般化し、MLP-Mixer モデルの新しい一般化である iMixer を見つけます。通常のフィードフォワードニューラルネットワークとは異なり、iMixer には、出力側から入力側に順方向に伝播する MLP レイヤーが含まれます。このモジュールを、可逆的で暗黙的で反復的なミキシングモジュールの例として特徴付けます。画像分類タスクでさまざまなデータセットを使用してモデルのパフォーマンスを評価し、ベースラインのバニラ MLP-Mixer と比較して、iMixer が合理的に改善を達成することを発見しました。結果は、Hopfield ネットワークと Mixer モデルの間の対応が、Transformer のようなアーキテクチャ設計のより広いクラスを理解するための原則として役立つことを意味します。

In the last few years, the success of Transformers in computer vision has stimulated the discovery of many alternative models that compete with Transformers, such as the MLP-Mixer. Despite their weak induced bias, these models have achieved performance comparable to well-studied convolutional neural networks. Recent studies on modern Hopfield networks suggest the correspondence between certain energy-based associative memory models and Transformers or MLP-Mixer, and shed some light on the theoretical background of the Transformer-type architectures design. In this paper we generalize the correspondence to the recently introduced hierarchical Hopfield network, and find iMixer, a novel generalization of MLP-Mixer model. Unlike ordinary feedforward neural networks, iMixer involves MLP layers that propagate forward from the output side to the input side. We characterize the module as an example of invertible, implicit, and iterative mixing module. We evaluate the model performance with various datasets on image classification tasks, and find that iMixer reasonably achieves the improvement compared to the baseline vanilla MLP-Mixer. The results imply that the correspondence between the Hopfield networks and the Mixer models serves as a principle for understanding a broader class of Transformer-like architecture designs.

updated: Tue Apr 25 2023 18:00:08 GMT+0000 (UTC)

published: Tue Apr 25 2023 18:00:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト