Meta Style Adversarial Training for Cross-Domain Few-Shot Learning

Yuqian Fu; Yu Xie; Yanwei Fu; Yu-Gang Jiang

クロスドメインの少数ショット学習のためのメタスタイルの敵対的トレーニング

Cross-Domain Few-Shot Learning (CD-FSL) は、異なるドメイン間での Few-Shot 学習に取り組む、最近出現したタスクです。これは、ソースデータセットで学習した事前知識を新しいターゲットデータセットに転送することを目的としています。 CD-FSL タスクは、異なるデータセット間の巨大なドメインギャップによって特に挑戦されます。重要なことに、このようなドメインギャップは実際にはビジュアルスタイルの変更に起因するものであり、Wave-SAN は、ソースデータのスタイル分布を広げることがこの問題を軽減するのに役立つことを経験的に示しています。ただし、wave-SAN は 2 つのイメージのスタイルを交換するだけです。このような単純な操作により、生成されたスタイルが「リアル」で「簡単」になり、ソーススタイルの元のセットに分類されます。したがって、バニラの敵対的学習に触発され、新しいモデルに依存しないメタスタイルの敵対的トレーニング (StyleAdv) メソッドと、新しいスタイルの敵対的攻撃方法が CD-FSL 用に提案されています。特に、私たちのスタイル攻撃方法は、モデルのトレーニングのために「仮想」と「ハード」の両方の敵対的スタイルを合成します。これは、元のスタイルを符号付きスタイルグラデーションで乱すことによって実現されます。スタイルを継続的に攻撃し、これらの挑戦的な敵対的スタイルをモデルに認識させることで、モデルは視覚的なスタイルに対して徐々に堅牢になり、新しいターゲットデータセットの一般化能力が向上します。典型的な CNN ベースのバックボーンに加えて、大規模な事前トレーニング済みのビジョントランスフォーマーで StyleAdv メソッドも採用しています。 8つのさまざまなターゲットデータセットで実施された広範な実験により、この方法の有効性が示されました。 ResNet または ViT 上に構築されているかどうかにかかわらず、CD-FSL の新しい最先端技術を実現します。コードとモデルがリリースされます。

Cross-Domain Few-Shot Learning (CD-FSL) is a recently emerging task that tackles few-shot learning across different domains. It aims at transferring prior knowledge learned on the source dataset to novel target datasets. The CD-FSL task is especially challenged by the huge domain gap between different datasets. Critically, such a domain gap actually comes from the changes of visual styles, and wave-SAN empirically shows that spanning the style distribution of the source data helps alleviate this issue. However, wave-SAN simply swaps styles of two images. Such a vanilla operation makes the generated styles ``real'' and ``easy'', which still fall into the original set of the source styles. Thus, inspired by vanilla adversarial learning, a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method is proposed for CD-FSL. Particularly, our style attack method synthesizes both ``virtual'' and ``hard'' adversarial styles for model training. This is achieved by perturbing the original style with the signed style gradients. By continually attacking styles and forcing the model to recognize these challenging adversarial styles, our model is gradually robust to the visual styles, thus boosting the generalization ability for novel target datasets. Besides the typical CNN-based backbone, we also employ our StyleAdv method on large-scale pretrained vision transformer. Extensive experiments conducted on eight various target datasets show the effectiveness of our method. Whether built upon ResNet or ViT, we achieve the new state of the art for CD-FSL. Codes and models will be released.

updated: Sat Feb 18 2023 11:54:37 GMT+0000 (UTC)

published: Sat Feb 18 2023 11:54:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト