UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

Haoyu Lu; Yuqi Huo; Guoxing Yang; Zhiwu Lu; Wei Zhan; Masayoshi Tomizuka; Mingyu Ding

UniAdapter: クロスモーダルモデリングのための統合されたパラメーター効率の高い転移学習

大規模なビジョン言語の事前トレーニング済みモデルは、さまざまなダウンストリームタスクへの有望な転送可能性を示しています。これらの基盤モデルのサイズとダウンストリームタスクの数が増加するにつれて、計算コストとストレージコストが高くなるため、標準の完全な微調整パラダイムは維持できなくなります。このホワイトペーパーでは、UniAdapter を提案します。これは、事前にトレーニングされたビジョン言語モデルでのパラメーター効率の高いクロスモーダル適応のために、ユニモーダルアダプターとマルチモーダルアダプターを統合します。具体的には、アダプターはさまざまなモダリティとその相互作用に分散され、調整可能なパラメーターの総数は部分的な重みの共有によって削減されます。統一された知識共有設計により、さまざまなダウンストリームタスクに役立つ強力なクロスモーダル表現が可能になり、事前トレーニング済みモデルの 1.0% ～ 2.0% の調整可能なパラメーターのみが必要になります。 6 つのクロスモーダルダウンストリームベンチマーク (ビデオテキスト検索、画像テキスト検索、VideoQA、および VQA を含む) に関する大規模な実験では、ほとんどの場合、UniAdapter が最先端の技術よりも優れているだけでなく、完全なものよりも優れていることが示されています。微調整戦略。特に、MSRVTT 検索タスクでは、UniAdapter は 2.2% のモデルパラメーターで 49.7% のリコール@1 を達成し、最新の競合他社を 2.0% 上回っています。コードとモデルは https://github.com/RERV/UniAdapter で入手できます。

Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 6 cross-modal downstream benchmarks (including video-text retrieval, image-text retrieval, VideoQA, and VQA) show that in most cases, UniAdapter not only outperforms the state-of-the-arts, but even beats the full fine-tuning strategy. Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49.7% recall@1 with 2.2% model parameters, outperforming the latest competitors by 2.0%. The code and models are available at https://github.com/RERV/UniAdapter.

updated: Sun May 21 2023 17:50:30 GMT+0000 (UTC)

published: Mon Feb 13 2023 18:59:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト