A Good Prompt Is Worth Millions of Parameters? Low-resource Prompt-based Learning for Vision-Language Models

Woojeong Jin; Yu Cheng; Yelong Shen; Weizhu Chen; Xiang Ren

良いプロンプトは何百万ものパラメータの価値がありますか？視覚言語モデルのための低リソースのプロンプトベースの学習

大規模な事前トレーニング済みの視覚言語（VL）モデルは、いくつかの例を使用して新しいタスクを学習したり、微調整せずに新しいタスクに一般化したりできます。ただし、これらの巨大なVLモデルは、モデルサイズが非現実的であり、推論速度が遅いため、実際のアプリケーションに展開するのは困難です。この作業では、視覚言語タスクに関する数ショットのプロンプトベースの学習者であるFewVLMを提案します。プレフィックス言語モデリング（PrefixLM）とマスク言語モデリング（MaskedLM）の両方を使用して、シーケンス間Transformerモデルを事前トレーニングし、VQAと画像キャプションのゼロショットと数ショットのパフォーマンスを向上させる簡単なプロンプトを導入します。 5つのVQAとキャプションデータセットの実験結果は、\ method \ xspaceがFrozenよりもパフォーマンスが優れていることを示しています。FrozenはゼロショットVQAv2で18.2％ポイント大きく、246倍大きいモデルであるPICaと同等の結果を達成しています。（1）プロンプトはゼロショットのパフォーマンスに大きく影響しますが、数ショットのパフォーマンスにはわずかに影響します。（2）MaskedLMは数ショットのVQAタスクを支援し、PrefixLMはキャプションのパフォーマンスを向上させます。（3）トレーニングセットのサイズが小さい場合、パフォーマンスは大幅に向上します。。

Large pretrained vision-language (VL) models can learn a new task with a handful of examples or generalize to a new task without fine-tuning. However, these gigantic VL models are hard to deploy for real-world applications due to their impractically huge model size and slow inference speed. In this work, we propose FewVLM, a few-shot prompt-based learner on vision-language tasks. We pretrain a sequence-to-sequence Transformer model with both prefix language modeling (PrefixLM) and masked language modeling (MaskedLM), and introduce simple prompts to improve zero-shot and few-shot performance on VQA and image captioning. Experimental results on five VQA and captioning datasets show that \method\xspace outperforms Frozen which is 31 times larger than ours by 18.2% point on zero-shot VQAv2 and achieves comparable results to a 246× larger model, PICa. We observe that (1) prompts significantly affect zero-shot performance but marginally affect few-shot performance, (2) MaskedLM helps few-shot VQA tasks while PrefixLM boosts captioning performance, and (3) performance significantly increases when training set size is small.

updated: Sat Oct 16 2021 06:07:59 GMT+0000 (UTC)

published: Sat Oct 16 2021 06:07:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト