π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Chengyue Wu; Teng Wang; Yixiao Ge; Zeyu Lu; Ruisong Zhou; Ying Shan; Ping Luo

π チューニング: 最適なマルチタスク補間によるマルチモーダル基礎モデルの転送

基礎モデルは、ユニモーダルタスクとマルチモーダルタスクの統合インターフェイスにより、マルチタスク学習において大きな進歩を遂げました。しかし、そのようなマルチタスク学習者の可能性は、転移学習では活用されていません。この研究では、視覚、言語、および視覚言語タスク向けの予測補間チューニング (π チューニング) と呼ばれる、普遍的なパラメーター効率の高い転移学習方法を紹介します。同様のタスクから学習した軽量タスク固有のエキスパートのパラメータを集約して、ターゲットの下流タスクを支援します。タスクの類似性は、統一されたモダリティに依存しない空間で予測され、タスクの関係を示すスケーラブルなグラフが得られます。 π チューニングにはいくつかの魅力的な利点があります。まず、特にデータが不足しているシナリオにおいて、同様のタスク間のモーダル内およびモーダル間の両方の転送可能性を柔軟に調査して、転送学習の精度と堅牢性を向上させます。第 2 に、プロンプトやアダプターなど、パラメーター効率の高いさまざまなタイプのエキスパートと互換性のある、マルチタスクの予測とその後の補間による転移学習の体系的なソリューションを提供します。第三に、14 個の単峰性データセットと 6 個の多峰性データセットに関するタスクレベルの相互利益に関する広範な研究により、フルショットとローショットの両方の領域において、π チューニングがファインチューニングやその他のパラメーター効率の高い転移学習手法を上回ることが示されました。タスクグラフにより、モダリティ間でのタスクの転送可能性についての詳細な解釈可能な分析も可能になります。コードは https://github.com/TencentARC/pi-Tuning で入手できます。

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning (π-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. π-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that π-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities. The code will be available at https://github.com/TencentARC/pi-Tuning.

updated: Wed May 17 2023 14:53:17 GMT+0000 (UTC)

published: Thu Apr 27 2023 17:49:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト