MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

Oscar Mañas; Pau Rodriguez; Saba Ahmadi; Aida Nematzadeh; Yash Goyal; Aishwarya Agrawal

MAPL: ビジョン言語の少数ショットプロンプトのためのユニモーダルな事前トレーニング済みモデルのパラメーター効率の高い適応

大規模な事前トレーニング済みモデルは、ユニモーダルビジョンおよび言語タスクにおいて、注目に値するゼロおよび (プロンプトベースの) 少数ショット学習器であることが証明されています。 MAPL は、凍結された事前トレーニング済みのユニモーダルモデルを再利用し、マルチモーダルビジョン言語 (VL) 設定で強力な一般化機能を活用する、シンプルでパラメーター効率の高い方法です。 MAPL は、位置合わせされた画像とテキストのデータを使用して、ユニモーダルモデルの表現空間間の軽量なマッピングを学習し、コンテキスト内のいくつかの例から目に見えない VL タスクに一般化できます。トレーニング可能なパラメーターの数が少ないため、MAPL は低データおよびドメイン内学習で効果的です。さらに、MAPL のモジュール性により、他の事前トレーニング済みモデルへの拡張が容易になります。いくつかの視覚的質問応答と画像キャプションのベンチマークに関する広範な実験により、MAPL は桁違いに少ないパラメーターをトレーニングしながら、同様の方法と比較して優れた、または競争力のあるパフォーマンスを達成することが示されています。 MAPL は、適度な計算リソースと公開データセットを使用して、わずか数時間でトレーニングできます。 https://github.com/mair-lab/mapl で、コードと事前トレーニング済みのモデルの重みをリリースします。

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation spaces of unimodal models using aligned image-text data, and can generalize to unseen VL tasks from just a few in-context examples. The small number of trainable parameters makes MAPL effective at low-data and in-domain learning. Moreover, MAPL's modularity enables easy extension to other pre-trained models. Extensive experiments on several visual question answering and image captioning benchmarks show that MAPL achieves superior or competitive performance compared to similar methods while training orders of magnitude fewer parameters. MAPL can be trained in just a few hours using modest computational resources and public datasets. We release our code and pre-trained model weights at https://github.com/mair-lab/mapl.

updated: Wed Mar 15 2023 00:02:06 GMT+0000 (UTC)

published: Thu Oct 13 2022 17:02:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト