Self-Distilled Vision Transformer for Domain Generalization

Maryam Sultana; Muzammal Naseer; Muhammad Haris Khan; Salman Khan; Fahad Shahbaz Khan

ドメイン一般化のための自己蒸留ビジョントランスフォーマー

最近、いくつかのドメイン一般化（DG）手法が提案され、有望なパフォーマンスを示していますが、それらのほとんどすべてが畳み込みニューラルネットワーク（CNN）に基づいています。多くの場合iidの仮定に基づいて構築された、標準ベンチマークでのCNNの優位性に挑戦しているビジョントランスフォーマー（ViT）のDGパフォーマンスの研究はほとんどまたはまったく進歩していません。これにより、ViTの実際の展開は疑わしいものになります。このホワイトペーパーでは、DGの問題に対処するためのViTの調査を試みます。 CNNと同様に、ViTも配布外のシナリオで苦労し、主な原因はソースドメインへの過剰適合です。 ViTのモジュラーアーキテクチャに触発されて、ViTの自己蒸留として造られたViTのシンプルなDGアプローチを提案します。中間変圧器ブロックの非ゼロエントロピー監視信号をキュレートすることにより、入出力マッピング問題の学習を容易にすることにより、ソースドメインへの過剰適合を低減します。さらに、新しいパラメーターを導入せず、さまざまなViTのモジュラー構成にシームレスにプラグインできます。 5つの挑戦的なデータセットで、さまざまなDGベースラインとさまざまなViTバックボーンを使用した場合の顕著なパフォーマンスの向上を経験的に示しています。さらに、最近の最先端のDG法に対して良好なパフォーマンスを報告します。事前にトレーニングされたモデルとともにコードは、https：//github.com/maryam089/SDViTで公開されています。

In recent past, several domain generalization (DG) methods have been proposed, showing encouraging performance, however, almost all of them build on convolutional neural networks (CNNs). There is little to no progress on studying the DG performance of vision transformers (ViTs), which are challenging the supremacy of CNNs on standard benchmarks, often built on i.i.d assumption. This renders the real-world deployment of ViTs doubtful. In this paper, we attempt to explore ViTs towards addressing the DG problem. Similar to CNNs, ViTs also struggle in out-of-distribution scenarios and the main culprit is overfitting to source domains. Inspired by the modular architecture of ViTs, we propose a simple DG approach for ViTs, coined as self-distillation for ViTs. It reduces the overfitting to source domains by easing the learning of input-output mapping problem through curating non-zero entropy supervisory signals for intermediate transformer blocks. Further, it does not introduce any new parameters and can be seamlessly plugged into the modular composition of different ViTs. We empirically demonstrate notable performance gains with different DG baselines and various ViT backbones in five challenging datasets. Moreover, we report favorable performance against recent state-of-the-art DG methods. Our code along with pre-trained models are publicly available at: https://github.com/maryam089/SDViT

updated: Mon Jul 25 2022 17:57:05 GMT+0000 (UTC)

published: Mon Jul 25 2022 17:57:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト