WaveMix: A Resource-efficient Neural Network for Image Analysis

Pranav Jeevan; Kavitha Viswanathan; Anandu A S; Amit Sethi

WaveMix: 画像解析のためのリソース効率の高いニューラルネットワーク

一般化可能性を損なうことなく、リソースに制約のあるシナリオで画像分析を可能にするために、WaveMix を導入します。WaveMix は、CNN やトランスフォーマーと比較して GPU RAM (メモリ) と計算 (レイテンシ) を削減する斬新で柔軟なニューラルフレームワークです。シフト不変の画像統計を利用する畳み込み層を使用することに加えて、提案されたフレームワークは、マルチレベルの 2 次元離散ウェーブレット変換 (2D-DWT) モジュールを使用して、スケール不変とエッジの疎性を利用します。これにより、次の利点が得られます。第 1 に、ウェーブレットモジュールの固定重みは、これらのイメージプライアに基づいて情報を再編成する間、パラメーターカウントに追加されません。次に、ウェーブレットモジュールは、フィーチャマップの空間範囲を 12×12 の整数乗でスケーリングします。これにより、前後のパスに必要なメモリとレイテンシが削減されます。最後に、マルチレベル 2D-DWT は、レイヤーごとの受容野の拡大がプーリング (使用していません) よりも速くなり、より効果的な空間トークンミキサーになります。 WaveMix は、ConvMixer、MLP-Mixer、PoolFormer、ランダムフィルター、フーリエ基底などの他のトークンミキシングモデルよりも一般化されています。これは、ウェーブレット変換が画像分解と空間トークンミキシングに適しているためです。 WaveMix は柔軟なモデルであり、アーキテクチャの変更を必要とせずに複数の画像タスクを適切に実行できます。 WaveMix は、Cityscapes 検証セットで 83% のセマンティックセグメンテーション mIoU を達成し、Transformer および CNN ベースのアーキテクチャよりも優れています。また、複数のデータセットで分類するための WaveMix の利点を示し、WaveMix が Places-365、EMNIST、および iNAT-mini データセットで新しい結果の状態を確立することを示します。

To allow image analysis in resource-constrained scenarios without compromising generalizability, we introduce WaveMix -- a novel and flexible neural framework that reduces the GPU RAM (memory) and compute (latency) compared to CNNs and transformers. In addition to using convolutional layers that exploit shift-invariant image statistics, the proposed framework uses multi-level two-dimensional discrete wavelet transform (2D-DWT) modules to exploit scale-invariance and edge sparseness, which gives it the following advantages. Firstly, the fixed weights of wavelet modules do not add to the parameter count while reorganizing information based on these image priors. Secondly, the wavelet modules scale the spatial extents of feature maps by integral powers of 12×12, which reduces the memory and latency required for forward and backward passes. Finally, a multi-level 2D-DWT leads to a quicker expansion of the receptive field per layer than pooling (which we do not use) and it is a more effective spatial token mixer. WaveMix also generalizes better than other token mixing models, such as ConvMixer, MLP-Mixer, PoolFormer, random filters, and Fourier basis, because the wavelet transform is much better suited for image decomposition and spatial token mixing. WaveMix is a flexible model that can perform well on multiple image tasks without needing architectural modifications. WaveMix achieves a semantic segmentation mIoU of 83% on the Cityscapes validation set outperforming transformer and CNN-based architectures. We also demonstrate the advantages of WaveMix for classification on multiple datasets and show that WaveMix establishes new state-of-the-results in Places-365, EMNIST, and iNAT-mini datasets.

updated: Thu Jan 19 2023 00:05:12 GMT+0000 (UTC)

published: Sat May 28 2022 09:08:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト