π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Chengyue Wu; Teng Wang; Yixiao Ge; Zeyu Lu; Ruisong Zhou; Ying Shan; Ping Luo

π-Tuning: 最適なマルチタスク補間によるマルチモーダルファンデーションモデルの転送

ファウンデーションモデルは、ユニモーダルタスクとマルチモーダルタスクの統一されたインターフェイスにより、マルチタスク学習において大きな進歩を遂げました。ただし、そのようなマルチタスク学習者の可能性は、転移学習中に活用されていません。この作業では、視覚、言語、および視覚言語タスク用の予測補間チューニング (π チューニング) と呼ばれる、パラメーター効率の高い普遍的な転移学習法を提示します。類似のタスクから学習した軽量のタスク固有のエキスパートのパラメーターを集約して、ターゲットのダウンストリームタスクを支援します。タスクの類似性は、統合されたモダリティに依存しない空間で予測され、タスクの関係を示すスケーラブルなグラフが生成されます。 πチューニングにはいくつかの魅力的な利点があります。まず、特にデータが少ないシナリオで、転送学習の精度と堅牢性を向上させるために、同様のタスク間のモーダル内およびモーダル間の両方の転送可能性を柔軟に調査します。第 2 に、マルチタスク予測とその後の補間による転移学習の体系的なソリューションを提供し、プロンプトやアダプターなど、さまざまなタイプのパラメーター効率の高いエキスパートと互換性があります。第 3 に、14 のユニモーダルデータセットと 6 つのマルチモーダルデータセットに対するタスクレベルの相互利益に関する広範な研究により、フルショットとローショットの両方の領域で、π チューニングが微調整やその他のパラメーター効率の高い転移学習法を凌駕することが示されています。タスクグラフは、モダリティ間のタスク転送可能性の詳細な解釈可能な分析も可能にします。

Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning (π-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. π-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that π-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities.

updated: Fri Apr 28 2023 02:10:31 GMT+0000 (UTC)

published: Thu Apr 27 2023 17:49:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト