StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization

Shirsha Bose; Enrico Fini; Ankit Jha; Mainak Singha; Biplab Banerjee; Elisa Ricci

StyLIP: CLIP ベースのドメイン一般化のためのマルチスケールスタイル条件付きプロンプト学習

大規模な基盤モデル (例: CLIP) は、慎重に設計された言語プロンプトを活用することで、ダウンストリームタスクで有望なゼロショット一般化パフォーマンスを示しています。しかし、それらの成功にもかかわらず、ほとんどの迅速な学習手法は、ドメインシフトの存在下ではパフォーマンスが低下する傾向があります。私たちの研究はこの問題に対処し、CLIP のドメイン全体での一般化能力を向上させるために、ドメインにとらわれない迅速な学習戦略に基づくドメイン一般化 (DG) の新しいアプローチである StyLIP を提案します。明示的なドメイン知識がない場合、プロンプトで事前トレーニング済みの CLIP から抽出された視覚スタイルとコンテンツ情報を解きほぐし、推論中に新しいドメインに簡単に適応できるようにすることを目指しています。さらに、これらのマルチスケールスタイル機能からプロンプトトークンを直接学習するスタイルプロジェクターのセットを検討し、生成されたプロンプト埋め込みは、後でコンテンツプロジェクターを通じて学習されたマルチスケールの視覚的機能と融合されます。 CLIP の凍結されたビジョンとテキストエンコーダーを考えると、プロジェクターは対照的に訓練されます。複数のベンチマークで5つの異なるDG設定で広範な実験を行い、StyLIPが関連する最先端の方法よりも一貫して優れていることを実証しています.

Large-scale foundation models (e.g., CLIP) have shown promising zero-shot generalization performance on downstream tasks by leveraging carefully designed language prompts. However, despite their success, most prompt learning techniques tend to underperform in the presence of domain shift. Our study addresses this problem and, to improve CLIP's generalization ability across domains, proposes StyLIP, a novel approach for Domain Generalization (DG) based on a domain-agnostic prompt learning strategy. In the absence of explicit domain knowledge, we aim to disentangle the visual style and the content information extracted from the pre-trained CLIP in the prompts so they can be effortlessly adapted to novel domains during inference. Furthermore, we consider a set of style projectors to learn the prompt tokens directly from these multi-scale style features, and the generated prompt embeddings are later fused with the multi-scale visual features learned through a content projector. The projectors are contrastively trained, given CLIP's frozen vision and text encoders. We present extensive experiments in five different DG settings on multiple benchmarks, demonstrating that StyLIP consistently outperforms the relevant state-of-the-art methods.

updated: Sat Feb 18 2023 07:36:16 GMT+0000 (UTC)

published: Sat Feb 18 2023 07:36:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト