ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

Jintao Guo; Na Wang; Lei Qi; Yinghuan Shi

ALOFT: ドメインの一般化のための動的低周波変換を備えた軽量 MLP のようなアーキテクチャ

ドメインの一般化 (DG) は、再トレーニングなしで複数のソースドメインを利用して、目に見えないターゲットドメインに適切に一般化するモデルを学習することを目的としています。既存の DG 作品のほとんどは、畳み込みニューラルネットワーク (CNN) に基づいています。ただし、畳み込みカーネルのローカル操作により、モデルがローカル表現 (テクスチャなど) に集中しすぎてしまい、本質的にモデルがソースドメインにオーバーフィットしやすくなり、一般化機能が妨げられます。最近、いくつかの MLP ベースの方法が、画像の異なるパッチ間のグローバルな相互作用を学習することにより、教師あり学習タスクで有望な結果を達成しました。これに着想を得て、この論文では、最初に DG における CNN と MLP 法の違いを分析し、MLP 法が CNN 法よりもグローバルな表現 (構造など) をより適切に捉えることができるため、より優れた一般化能力を示すことを発見しました。次に、最近の軽量 MLP メソッドに基づいて、最先端の CNN ベースのメソッドよりも優れた強力なベースラインを取得します。ベースラインは、フィルターを使用してグローバルな構造表現を学習し、周波数空間で構造に関係のない情報を抑制することができます。さらに、グローバル構造機能を維持しながらローカルテクスチャ機能を摂動できる動的低周波数スペクトル変換（ALOFT）を提案し、フィルターが構造に関係のない情報を十分に削除できるようにします。 4 つのベンチマークでの広範な実験により、SOTA CNN ベースの DG メソッドと比較して、少数のパラメーターで大幅なパフォーマンスの向上を達成できることが実証されました。コードは https://github.com/lingeringlight/ALOFT/ で入手できます。

Domain generalization (DG) aims to learn a model that generalizes well to unseen target domains utilizing multiple source domains without re-training. Most existing DG works are based on convolutional neural networks (CNNs). However, the local operation of the convolution kernel makes the model focus too much on local representations (e.g., texture), which inherently causes the model more prone to overfit to the source domains and hampers its generalization ability. Recently, several MLP-based methods have achieved promising results in supervised learning tasks by learning global interactions among different patches of the image. Inspired by this, in this paper, we first analyze the difference between CNN and MLP methods in DG and find that MLP methods exhibit a better generalization ability because they can better capture the global representations (e.g., structure) than CNN methods. Then, based on a recent lightweight MLP method, we obtain a strong baseline that outperforms most state-of-the-art CNN-based methods. The baseline can learn global structure representations with a filter to suppress structure irrelevant information in the frequency space. Moreover, we propose a dynAmic LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture features while preserving global structure features, thus enabling the filter to remove structure-irrelevant information sufficiently. Extensive experiments on four benchmarks have demonstrated that our method can achieve great performance improvement with a small number of parameters compared to SOTA CNN-based DG methods. Our code is available at https://github.com/lingeringlight/ALOFT/.

updated: Fri Mar 31 2023 11:55:55 GMT+0000 (UTC)

published: Tue Mar 21 2023 08:36:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト