DynaMixer: A Vision MLP Architecture with Dynamic Mixing

Ziyu Wang; Wenhao Jiang; Yiming Zhu; Li Yuan; Yibing Song; Wei Liu

DynaMixer：動的ミキシングを備えたVisionMLPアーキテクチャ

最近、MLPのような視覚モデルは、主流の視覚認識タスクで有望なパフォーマンスを達成しています。ビジョントランスフォーマーやCNNとは対照的に、MLPのようなモデルの成功は、トークンとチャネル間の単純な情報融合操作が、深層認識モデルの優れた表現力を生み出すことができることを示しています。ただし、既存のMLPのようなモデルは、静的な融合操作によってトークンを融合し、混合されるトークンの内容への適応性を欠いています。したがって、通常の情報融合手順は十分に効果的ではありません。この目的のために、このペーパーでは、動的な情報融合に頼る、DynaMixerと呼ばれる効率的なMLPのようなネットワークアーキテクチャを紹介します。重要なのは、DynaMixerモデルが依存する手順を提案し、混合するすべてのトークンの内容を活用して混合行列を動的に生成することです。時間計算量を減らし、ロバスト性を向上させるために、次元削減手法とマルチセグメント融合メカニズムが採用されています。提案されたDynaMixerモデル（97Mパラメーター）は、追加のトレーニングデータなしでImageNet-1Kデータセットで84.3％のトップ1精度を達成し、最先端のビジョンMLPモデルに対して良好に機能します。パラメータの数を26Mに減らしても、82.7％のトップ1精度を達成し、同様の容量を持つ既存のMLPのようなモデルを上回ります。コードはhttps://github.com/ziyuwwang/DynaMixerで入手できます。

Recently, MLP-like vision models have achieved promising performances on mainstream visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models. However, existing MLP-like models fuse tokens through static fusion operations, lacking adaptability to the contents of the tokens to be mixed. Thus, customary information fusion procedures are not effective enough. To this end, this paper presents an efficient MLP-like network architecture, dubbed DynaMixer, resorting to dynamic information fusion. Critically, we propose a procedure, on which the DynaMixer model relies, to dynamically generate mixing matrices by leveraging the contents of all the tokens to be mixed. To reduce the time complexity and improve the robustness, a dimensionality reduction technique and a multi-segment fusion mechanism are adopted. Our proposed DynaMixer model (97M parameters) achieves 84.3% top-1 accuracy on the ImageNet-1K dataset without extra training data, performing favorably against the state-of-the-art vision MLP models. When the number of parameters is reduced to 26M, it still achieves 82.7% top-1 accuracy, surpassing the existing MLP-like models with a similar capacity. The code is available at https://github.com/ziyuwwang/DynaMixer.

updated: Sat Jun 18 2022 03:38:04 GMT+0000 (UTC)

published: Fri Jan 28 2022 12:43:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト