Tuning Pre-trained Model via Moment Probing

Mingze Gao; Qilong Wang; Zhenyi Lin; Pengfei Zhu; Qinghua Hu; Jingbo Zhou

モーメントプロービングによる事前トレーニング済みモデルのチューニング

最近、大規模な事前トレーニング済みモデルの効率的な微調整が研究の関心を集めており、基本モジュールとして線形プローブ (LP) がタスク依存分類の最終表現の活用に関与しています。しかし、既存の手法のほとんどは、いくつかの学習可能なパラメータを効果的に導入する方法に焦点を当てており、一般的に使用される LP モジュールに注意を払っている研究はほとんどありません。この論文では、LP の可能性をさらに探求するための新しいモーメントプロービング (MP) 手法を提案します。最終的な特徴 (ViT の単語トークンなど) または分類トークンの平均に基づいて線形分類ヘッドを構築する LP とは異なり、当社の MP は特徴分布に基づいて線形分類器を実行します。これにより、特徴に固有のより豊富な統計情報を活用することで、より強力な表現能力が提供されます。具体的には、特徴量の分布を特徴関数で表し、特徴量の 1 次および 2 次モーメントを使用して効率的に近似します。さらに、効率的かつ効果的な方法で 2 次モーメントを計算するために、マルチヘッド畳み込み相互共分散 (MHC^3) を提案します。 MP が特徴学習に影響を与える可能性があることを考慮して、MP に基づいてバックボーンの 2 つの再調整パラメータ (PSRP) を学習する部分共有モジュール、つまり MP_+ を導入します。さまざまなモデルを使用した 10 のベンチマークに関する広範な実験により、当社の MP は LP を大幅に上回っており、より少ないトレーニングコストで同等の製品と競合できる一方、当社の MP_+ は最先端のパフォーマンスを達成していることがわかりました。

Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC^3) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP_+. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP_+ achieves state-of-the-art performance.

updated: Tue Sep 05 2023 01:05:50 GMT+0000 (UTC)

published: Fri Jul 21 2023 04:15:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト