WaveMix-Lite: A Resource-efficient Neural Network for Image Analysis

Pranav Jeevan; Kavitha Viswanathan; Amit Sethi

WaveMix-Lite：画像分析のためのリソース効率の高いニューラルネットワーク

ニューラルネットワークの画像分析タスクを一般化する機能の向上は、パラメーターとレイヤーの数、データセットサイズ、トレーニングとテストの計算、およびGPURAMの増加を犠牲にしてもたらされました。新しいアーキテクチャ（WaveMix-Lite）を紹介します。これは、必要なリソースを減らしながら、現代のトランスフォーマーや畳み込みニューラルネットワーク（CNN）と同等に一般化できます。 WaveMix-Liteは、2D離散ウェーブレット変換を使用して、ピクセルからの空間情報を効率的に混合します。 WaveMix-Liteは、トランスフォーマーやCNNとは異なり、大幅なアーキテクチャの変更を必要とせずに、画像分類やセマンティックセグメンテーションなどの複数のビジョンタスクに使用できる、用途が広くスケーラブルなアーキテクチャフレームワークのようです。単一のGPUでトレーニングしながら、いくつかの精度ベンチマークを満たすか超えることができます。たとえば、5つのEMNISTデータセットで最先端の精度を達成し、ImageNet-1K（64×64画像）のCNNとトランスフォーマーを上回り、Cityscapes検証セットで75.32％のmIoUを達成し、1つ未満を使用します。 -5番目の数のパラメーターと同等のCNNまたはトランスフォーマーのGPURAMの半分。私たちの実験は、ニューラルアーキテクチャの畳み込み要素が画像のシフト不変性特性を利用する一方で、新しいタイプのレイヤー（ウェーブレット変換など）は、スケール不変性やオブジェクトの有限空間範囲など、画像の追加の特性を利用できることを示しています。

Gains in the ability to generalize on image analysis tasks for neural networks have come at the cost of increased number of parameters and layers, dataset sizes, training and test computations, and GPU RAM. We introduce a new architecture -- WaveMix-Lite -- that can generalize on par with contemporary transformers and convolutional neural networks (CNNs) while needing fewer resources. WaveMix-Lite uses 2D-discrete wavelet transform to efficiently mix spatial information from pixels. WaveMix-Lite seems to be a versatile and scalable architectural framework that can be used for multiple vision tasks, such as image classification and semantic segmentation, without requiring significant architectural changes, unlike transformers and CNNs. It is able to meet or exceed several accuracy benchmarks while training on a single GPU. For instance, it achieves state-of-the-art accuracy on five EMNIST datasets, outperforms CNNs and transformers in ImageNet-1K (64×64 images), and achieves an mIoU of 75.32 % on Cityscapes validation set, while using less than one-fifth the number parameters and half the GPU RAM of comparable CNNs or transformers. Our experiments show that while the convolutional elements of neural architectures exploit the shift-invariance property of images, new types of layers (e.g., wavelet transform) can exploit additional properties of images, such as scale-invariance and finite spatial extents of objects.

updated: Wed Jun 01 2022 17:09:58 GMT+0000 (UTC)

published: Sat May 28 2022 09:08:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト