Domain Generalisation with Bidirectional Encoder Representations from Vision Transformers

Hamza Riaz; Alan F. Smeaton

ビジョントランスフォーマーからの双方向エンコーダー表現によるドメインの一般化

ドメインの一般化には、ソースドメインからの知識を、目に見えないターゲットドメインに一般化できる単一のモデルにプールすることが含まれます。ドメイン一般化に関する最近の研究では、深層学習モデルがトレーニングに使用されたデータ分布とは異なるデータ分布と相互作用するため、深層学習モデルを使用する際に課題に直面しています。ここでは、ビジョントランスフォーマーを使用して、配布外 (OOD) ビジョンベンチマークのドメイン一般化を実行します。最初に、配布外データに関する 4 つのビジョントランスフォーマーアーキテクチャ、つまり ViT、LeViT、DeiT、および BEIT を調査します。 Image Transformers (BEIT) アーキテクチャによる双方向エンコーダ表現が最も優れたパフォーマンスを発揮するため、PACS、Home-Office、DomainNet の 3 つのベンチマークでのさらなる実験でこれを使用します。その結果、検証とテストの精度が大幅に向上し、実装によりディストリビューション内データと OOD データの間のギャップが大幅に克服されたことがわかりました。

Domain generalisation involves pooling knowledge from source domain(s) into a single model that can generalise to unseen target domain(s). Recent research in domain generalisation has faced challenges when using deep learning models as they interact with data distributions which differ from those they are trained on. Here we perform domain generalisation on out-of-distribution (OOD) vision benchmarks using vision transformers. Initially we examine four vision transformer architectures namely ViT, LeViT, DeiT, and BEIT on out-of-distribution data. As the bidirectional encoder representation from image transformers (BEIT) architecture performs best, we use it in further experiments on three benchmarks PACS, Home-Office and DomainNet. Our results show significant improvements in validation and test accuracy and our implementation significantly overcomes gaps between within-distribution and OOD data.

updated: Sun Jul 16 2023 17:50:37 GMT+0000 (UTC)

published: Sun Jul 16 2023 17:50:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト