Prompt Generation Networks for Input-based Adaptation of Frozen Vision Transformers

Jochem Loedeman; Maarten C. Stol; Tengda Han; Yuki M. Asano

冷凍ビジョン変圧器の入力ベースの適応のためのプロンプト生成ネットワーク

コンピュータービジョンにトランスフォーマーアーキテクチャが導入されたことで、モデルの規模を拡大することが、パフォーマンスとロバスト性の向上を達成するための明確な道筋であることが実証されました。ただし、モデルのパラメーター数が数十億に達すると、従来の微調整アプローチはますます制限的になり、モデルが NLP のように推論 API としてホストされるようになると、実行不可能になることさえあります。この目的のために、追加の入力を学習することによってモデルが適応される視覚的プロンプト学習が、凍結されたクラウドホスト型モデルを適応させるための潜在的な解決策として浮上しています。推論中に、これはモデルのフォワードパス関数の内部へのアクセスを必要としません。後処理も必要ありません。この作業では、トークンのエンドツーエンドの学習ライブラリからサンプリングすることにより、高性能の入力依存プロンプトを生成するプロンプト生成ネットワーク (PGN) を提案します。さらに、「プロンプト反転」トリックを紹介します。これにより、PGN を潜在空間で効率的にトレーニングできますが、推論のために厳密に入力のみのプロンプトとして展開できます。 PGN は、事前にトレーニングされたモデルをさまざまな新しいデータセットに適応させるのに効果的であることを示しています。12/12 データセットでは以前の方法を大幅に上回り、5/12 では完全な微調整よりも優れていますが、必要なパラメーターは 100 分の 1 です。

With the introduction of the transformer architecture in computer vision, increasing model scale has been demonstrated as a clear path to achieving performance and robustness gains. However, with model parameter counts reaching the billions, classical finetuning approaches are becoming increasingly limiting and even unfeasible when models become hosted as inference APIs, as in NLP. To this end, visual prompt learning, whereby a model is adapted by learning additional inputs, has emerged as a potential solution for adapting frozen and cloud-hosted models: During inference, this neither requires access to the internals of models' forward pass function, nor requires any post-processing. In this work, we propose the Prompt Generation Network (PGN) that generates high performing, input-dependent prompts by sampling from an end-to-end learned library of tokens. We further introduce the "prompt inversion" trick, with which PGNs can be efficiently trained in a latent space but deployed as strictly input-only prompts for inference. We show the PGN is effective in adapting pre-trained models to various new datasets: It surpasses previous methods by a large margin on 12/12 datasets and even outperforms full-finetuning on 5/12, while requiring 100x less parameters.

updated: Wed Apr 19 2023 15:48:45 GMT+0000 (UTC)

published: Wed Oct 12 2022 17:59:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト