SplitMixer: Fat Trimmed From MLP-like Models

Ali Borji; Sikun Lin

SplitMixer：MLPのようなモデルからトリミングされた脂肪

視覚認識のための、シンプルで軽量な等方性MLPのようなアーキテクチャであるSplitMixerを紹介します。これには、空間位置（空間ミキシング）とチャネル（チャネルミキシング）にわたって情報を混合するための2種類のインターリーブ畳み込み演算が含まれています。 1つ目は、2Dカーネルではなく、2つの深度方向の1Dカーネルを順次適用して、空間情報を混合することです。 2つ目は、共有パラメーターの有無にかかわらず、チャネルをオーバーラップまたは非オーバーラップセグメントに分割し、提案されたチャネルミキシングアプローチまたは3D畳み込みを適用してチャネル情報をミキシングします。設計の選択に応じて、精度、パラメーターの数、および速度のバランスをとるために、いくつかのSplitMixerバリアントを構築できます。理論的にも実験的にも、SplitMixerは、パラメーターとFLOPSの数が大幅に少ない一方で、最先端のMLPのようなモデルと同等のパフォーマンスを発揮することを示しています。たとえば、強力なデータ拡張と最適化がない場合、SplitMixerはわずか0.28MのパラメーターでCIFAR-10で約94％の精度を達成しますが、ConvMixerは約0.6Mのパラメーターで同じ精度を達成します。よく知られているMLP-Mixerは、17.1Mのパラメーターで85.45％を達成します。 CIFAR-100データセットでは、SplitMixerはConvMixerと同等の約73％の精度を達成しますが、パラメーターとFLOPSは約52％少なくなります。私たちの結果が、より効率的なビジョンアーキテクチャの発見に向けたさらなる研究のきっかけとなり、MLPのようなモデルの開発を促進することを願っています。コードはhttps://github.com/aliborji/splitmixerで入手できます。

We present SplitMixer, a simple and lightweight isotropic MLP-like architecture, for visual recognition. It contains two types of interleaving convolutional operations to mix information across spatial locations (spatial mixing) and channels (channel mixing). The first one includes sequentially applying two depthwise 1D kernels, instead of a 2D kernel, to mix spatial information. The second one is splitting the channels into overlapping or non-overlapping segments, with or without shared parameters, and applying our proposed channel mixing approaches or 3D convolution to mix channel information. Depending on design choices, a number of SplitMixer variants can be constructed to balance accuracy, the number of parameters, and speed. We show, both theoretically and experimentally, that SplitMixer performs on par with the state-of-the-art MLP-like models while having a significantly lower number of parameters and FLOPS. For example, without strong data augmentation and optimization, SplitMixer achieves around 94% accuracy on CIFAR-10 with only 0.28M parameters, while ConvMixer achieves the same accuracy with about 0.6M parameters. The well-known MLP-Mixer achieves 85.45% with 17.1M parameters. On CIFAR-100 dataset, SplitMixer achieves around 73% accuracy, on par with ConvMixer, but with about 52% fewer parameters and FLOPS. We hope that our results spark further research towards finding more efficient vision architectures and facilitate the development of MLP-like models. Code is available at https://github.com/aliborji/splitmixer.

updated: Mon Jul 25 2022 17:04:19 GMT+0000 (UTC)

published: Thu Jul 21 2022 01:37:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト