Cross-domain Federated Adaptive Prompt Tuning for CLIP

Shangchao Su; Mingzhao Yang; Bin Li; Xiangyang Xue

CLIP のクロスドメインフェデレーテッドアダプティブプロンプトチューニング

フェデレーテッドラーニング (FL) を使用すると、複数の関係者がデータを開示することなく、グローバルモデルを共同でトレーニングできます。既存の研究では、多くの場合、すべてのモデルパラメーターがトレーニング手順に参加する必要があります。ただし、強力な事前トレーニング済みモデルの出現により、FL でより少ない学習可能なパラメーターでより高いパフォーマンスを達成することが可能になります。このホワイトペーパーでは、FL で強力な表現能力を発揮する、ビジョン言語の事前トレーニング済みモデル、CLIP を使用したクロスドメインフェデレーション画像分類シナリオ用のフェデレーション適応プロンプトチューニングアルゴリズム、FedAPT を提案します。直接フェデレーションされたプロンプトチューニングと比較して、私たちのコアアイデアは、各テストサンプルの特定のドメイン知識を適応的に解き放ち、パーソナライズされたプロンプトを提供することです。このアイデアを実装するために、グローバルプロンプト、適応ネットワーク、およびいくつかのキーで構成される適応プロンプトチューニングモジュールを設計します。サーバーはキーのセットをランダムに生成し、各クライアントに一意のキーを割り当てます。次に、すべてのクライアントが、ローカルデータセットと凍結されたキーを使用して、グローバル適応ネットワークとグローバルプロンプトを協調的にトレーニングします。最終的に、グローバル集約モデルは、各テストサンプルのドメイン機能に基づいて、パーソナライズされたプロンプトを CLIP に割り当てることができます。 2 つのマルチドメイン画像分類データセットに対して広範な実験を行います。結果は、完全にトレーニングされたモデルのパラメーター数の 10% 未満で FedAPT がより優れたパフォーマンスを達成できること、およびグローバルモデルが異なるクライアントドメインで同時に適切に機能することを示しています。

Federated learning (FL) allows multiple parties to collaboratively train a global model without disclosing their data. Existing research often requires all model parameters to participate in the training procedure. However, with the advent of powerful pre-trained models, it becomes possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for cross-domain federated image classification scenarios with the vision-language pre-trained model, CLIP, which gives play to the strong representation ability in FL. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a global prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and global prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets. The results show that FedAPT can achieve better performance with less than 10% of the number of parameters of the fully trained model, and the global model can perform well in different client domains simultaneously.

updated: Tue Nov 15 2022 03:10:05 GMT+0000 (UTC)

published: Tue Nov 15 2022 03:10:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト