Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains

Xin Zhang; Shixiang Shane Gu; Yutaka Matsuo; Yusuke Iwasawa

目に見えないドメインにCLIPを効率的に適応させるためのドメインプロンプト学習

ドメイン汎化 (DG) は、目に見えないドメインの一般化可能なモデルを学習することを目的とした難しい転移学習問題です。最近の基礎モデル (FM) は、多くの分布シフトに対して堅牢であるため、DG のパフォーマンスを大幅に改善するはずです。この作業では、画像分類の DG 問題に対して Visual-Language Foundation Model である CLIP を採用する一般的な方法を研究します。 ERM は、標準的な DG ベンチマークを使用してより大きなバックボーンとトレーニングデータセットを使用して精度を大幅に向上させますが、多くの現実の状況では FM の微調整は実用的ではありません。条件付きプロンプト生成の形でのドメイン推論の新しいアプローチとして、ドメインプロンプト学習 (DPL) を提案します。 DPL は軽量のプロンプトジェネレーター (3 層 MLP) をトレーニングするだけで大幅な精度の向上を達成しました。このパラメーターは、以前の DG 文献の分類プロジェクターと同等のスケールです。 \dplshort~ を CLIP と組み合わせると驚くべきパフォーマンスが得られ、PACS、VLCS、OfficeHome、TerraIncognita などのいくつかの標準データセットで、ゼロショット CLIP の精度が 73.7% から 79.3% に向上します。私たちのアプローチのシンプルさと成功が、ドメインの汎化分野における基礎モデルの幅広い採用と分析につながることを願っています。コードは https://github.com/shogi880/DPLCLIP で入手できます。

Domain generalization (DG) is a difficult transfer learning problem aiming to learn a generalizable model for unseen domains. Recent foundation models (FMs) are robust to many distribution shifts and, therefore, should substantially improve the performance of DG. In this work, we study generic ways to adopt CLIP, a Visual-Language Foundation Model, for DG problems in image classification. While ERM greatly improves the accuracy with bigger backbones and training datasets using standard DG benchmarks, fine-tuning FMs is not practical in many real-world situations. We propose Domain Prompt Learning (DPL) as a novel approach for domain inference in the form of conditional prompt generation. DPL achieved a significant accuracy improvement with only training a lightweight prompt generator (a three-layer MLP), whose parameter is of equivalent scale to the classification projector in the previous DG literature. Combining \dplshort~with CLIP provides surprising performance, raising the accuracy of zero-shot CLIP from 73.7% to 79.3% on several standard datasets, namely PACS, VLCS, OfficeHome, and TerraIncognita. We hope the simplicity and success of our approach lead to broader adoption and analysis of foundation models in the domain generalization field. Our code is available at https://github.com/shogi880/DPLCLIP.

updated: Wed Aug 17 2022 07:11:05 GMT+0000 (UTC)

published: Thu Nov 25 2021 00:25:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト