MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

Jingjing Jiang; Nanning Zheng

MixPHM: 低リソースの視覚的質問応答のための冗長性を考慮したパラメーター効率の良いチューニング

最近、事前トレーニング済みの視覚言語モデル (VLM) を微調整することが、VQA で最先端のパフォーマンスを達成するための 1 つの一般的なパラダイムになりました。ただし、VLM がスケーリングされると、計算コストが高くなり、ストレージが非効率的になり、低リソース設定で特定のタスクの完全なモデルパラメーターを調整するためにオーバーフィッティングが発生しやすくなります。現在のパラメーター効率の高いチューニング方法では、チューニング可能なパラメーターの数が劇的に減少しますが、完全な微調整では、依然として大きなパフォーマンスギャップが存在します。この論文では、MixPHM を提案します。これは、低リソース VQA での完全な微調整よりも優れた、冗長性を考慮したパラメーター効率の高い調整方法です。具体的には、MixPHM は、複数の PHM エキスパートによってエキスパートが混在する方法で実装された軽量モジュールです。パラメーターの冗長性を減らすために、低ランクの部分空間でエキスパートの重みを再パラメーター化し、MixPHM 内および MixPHM 全体で重みの一部を共有します。さらに、表現の冗長性の定量的分析に基づいて、冗長性の正則化を提案します。これにより、MixPHM は、タスクに関連する相関を促進しながら、タスクに関係のない冗長性を削減できます。さまざまな低リソース設定で VQA v2、GQA、および OK-VQA で実施された実験では、当社の MixPHM が最先端のパラメーター効率的な方法よりも優れており、一貫して完全な微調整を上回っている唯一の方法であることが示されています。

Recently, finetuning pretrained vision-language models (VLMs) has become one prevailing paradigm to achieve state-of-the-art performance in VQA. However, as VLMs scale, it becomes computationally expensive, storage inefficient, and prone to overfitting to tune full model parameters for a specific task in low-resource settings. Although current parameter-efficient tuning methods dramatically reduce the number of tunable parameters, there still exists a significant performance gap with full finetuning. In this paper, we propose MixPHM, a redundancy-aware parameter-efficient tuning method that outperforms full finetuning in low-resource VQA. Specifically, MixPHM is a lightweight module implemented by multiple PHM-experts in a mixture-of-experts manner. To reduce parameter redundancy, we reparameterize expert weights in a low-rank subspace and share part of the weights inside and across MixPHM. Moreover, based on our quantitative analysis of representation redundancy, we propose redundancy regularization, which facilitates MixPHM to reduce task-irrelevant redundancy while promoting task-relevant correlation. Experiments conducted on VQA v2, GQA, and OK-VQA with different low-resource settings show that our MixPHM outperforms state-of-the-art parameter-efficient methods and is the only one consistently surpassing full finetuning.

updated: Thu Mar 02 2023 13:28:50 GMT+0000 (UTC)

published: Thu Mar 02 2023 13:28:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト