ResMLP: Feedforward networks for image classification with data-efficient training

Hugo Touvron; Piotr Bojanowski; Mathilde Caron; Matthieu Cord; Alaaeldin El-Nouby; Edouard Grave; Gautier Izacard; Armand Joulin; Gabriel Synnaeve; Jakob Verbeek; Hervé Jégou

ResMLP：データ効率の高いトレーニングによる画像分類のためのフィードフォワードネットワーク

画像分類のために多層パーセプトロン上に完全に構築されたアーキテクチャであるResMLPを紹介します。これは、（i）画像パッチがチャネル間で独立して同一に相互作用する線形層と（ii）チャネルがパッチごとに独立して相互作用する2層フィードフォワードネットワークを交互に繰り返す単純な残差ネットワークです。大量のデータ拡張とオプションで蒸留を使用する最新のトレーニング戦略でトレーニングすると、ImageNetで驚くほど優れた精度と複雑さのトレードオフが達成されます。また、ResMLPモデルを自己監視設定でトレーニングして、ラベル付きデータセットの使用から事前情報をさらに削除します。最後に、モデルを機械翻訳に適合させることで、驚くほど良い結果が得られます。事前にトレーニングされたモデルと、Timmライブラリに基づくコードを共有します。

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification. It is a simple residual network that alternates (i) a linear layer in which image patches interact, independently and identically across channels, and (ii) a two-layer feed-forward network in which channels interact independently per patch. When trained with a modern training strategy using heavy data-augmentation and optionally distillation, it attains surprisingly good accuracy/complexity trade-offs on ImageNet. We also train ResMLP models in a self-supervised setup, to further remove priors from employing a labelled dataset. Finally, by adapting our model to machine translation we achieve surprisingly good results. We share pre-trained models and our code based on the Timm library.

updated: Thu Jun 10 2021 16:06:13 GMT+0000 (UTC)

published: Fri May 07 2021 17:31:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト