Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning

Zhongzhi Yu; Shang Wu; Yonggan Fu; Shunyao Zhang; Yingyan Lin

Hint-Aug: ファウンデーションビジョントランスフォーマーからヒントを得て、ブーストされた少数ショットパラメータの効率的なチューニングに向けて

ダウンストリームタスクでファウンデーションビジョントランスフォーマー (FViT) を調整する需要が高まっているにもかかわらず、FViT のデータを大量に消費する性質のため、データが制限されたシナリオ (たとえば、少数ショットの調整) で FViT の可能性を完全に引き出すことは依然として課題です。少数ショットのチューニングデータに含まれる機能が限られているため、一般的なデータ拡張手法は、このコンテキストでは不十分です。この課題に取り組むために、最初に少数ショットチューニングで FViT の機会を特定します。事前トレーニング済みの FViT 自体は、大規模な事前トレーニングデータから非常に代表的な特徴を既に学習しており、広く使用されているパラメーター効率の高いチューニング中に完全に保存されます。したがって、これらの学習した機能を活用してチューニングデータを増強することで、少数ショットの FViT チューニングの有効性を高めることができるという仮説を立てています。この目的のために、Hint-based Data Augmentation (Hint-Aug) と呼ばれるフレームワークを提案します。これは、事前トレーニング済みの FViT の学習された機能を使用して、チューニングサンプルの過適合部分を増強することにより、少数ショットチューニングで FViT を強化することを目的としています。具体的には、Hint-Aug は 2 つの主要なイネーブラーを統合します。(1) Attentive Over-fitting Detector (AOD) は、少数ショットのチューニングデータでのオーバーフィッティングを潜在的に軽減するために、基礎 ViT の自信過剰なパッチを検出します。 Confusion-based Feature Infusion (CFI) モジュールは、調整中の機能の多様性を高めるために、上記の AOD によって検出された自信過剰のパッチを使用して、事前トレーニング済みの FViT から混同しやすい機能を注入します。 5 つのデータセットと 3 つのパラメーター効率の高い調整手法に関する広範な実験とアブレーション研究により、Hint-Aug の有効性が一貫して検証されています。さまざまなローショット設定で、最先端の (SOTA) データ拡張法よりも 0.04% ～ 32.91% 高い精度.たとえば、Pet データセットでは、Hint-Aug は SOTA データ拡張メソッドよりも 50% 少ないトレーニングデータで 2.22% 高い精度を達成します。

Despite the growing demand for tuning foundation vision transformers (FViTs) on downstream tasks, fully unleashing FViTs' potential under data-limited scenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungry nature. Common data augmentation techniques fall short in this context due to the limited features contained in the few-shot tuning data. To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning. We thus hypothesize that leveraging those learned features to augment the tuning data can boost the effectiveness of few-shot FViT tuning. To this end, we propose a framework called Hint-based Data Augmentation (Hint-Aug), which aims to boost FViT in few-shot tuning by augmenting the over-fitted parts of tuning samples with the learned features of pretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) an Attentive Over-fitting Detector (AOD) to detect over-confident patches of foundation ViTs for potentially alleviating their over-fitting on the few-shot tuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuse easy-to-confuse features from the pretrained FViTs with the over-confident patches detected by the above AOD in order to enhance the feature diversity during tuning. Extensive experiments and ablation studies on five datasets and three parameter-efficient tuning techniques consistently validate Hint-Aug's effectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA) data augmentation method under various low-shot settings. For example, on the Pet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less training data over SOTA data augmentation methods.

updated: Mon Jun 26 2023 06:01:14 GMT+0000 (UTC)

published: Tue Apr 25 2023 02:22:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト