Introspective Cross-Attention Probing for Lightweight Transfer of Pre-trained Models

Yonatan Dukler; Alessandro Achille; Hao Yang; Varsha Vivek; Luca Zancato; Ben Bowman; Avinash Ravichandran; Charless Fowlkes; Ashwin Swaminathan; Stefano Soatto

事前トレーニング済みモデルの軽量転送のための内省的クロスアテンションプロービング

InCA は、事前トレーニング済みモデルの任意の活性化層にクロスアテンドする転送学習の軽量な方法です。トレーニング中、InCA は単一のフォワードパスを使用して複数のアクティベーションを抽出します。これらは外部のクロスアテンションアダプターに渡され、新たにトレーニングされ、ダウンストリームタスク用に結合または選択されます。最高スコアのアダプターを 1 つ選択した場合でも、InCA は完全な微調整に匹敵するパフォーマンスを、最後のレイヤーのみの微調整に匹敵するコストで達成することを示しています。たとえば、事前トレーニング済みの ViT-L/16 モデルの 1.3% のサイズのクロスアテンションプローブを使用すると、ベースラインの 51% のトレーニングコストで、完全な微調整パラゴンの 0.2% 以内のパフォーマンスを達成できます。 11 のダウンストリーム分類タスク。他の形式の効率的な適応とは異なり、InCA は事前トレーニング済みモデルによる逆伝播を必要としないため、トレーニングと推論の両方で実行が変更されません。 InCA の汎用性は、最終層には存在しないが中間層のアクティベーションでアクセス可能な情報にアクセスする必要がある場合がある、きめの細かいタスクで最もよく示されます。バックボーンが固定されているため、InCA では並列アンサンブルと複数タスクの並列実行が可能です。 InCA は、ImageNet-to-Sketch マルチタスクベンチマークで最先端のパフォーマンスを達成します。

We propose InCA, a lightweight method for transfer learning that cross-attends to any activation layer of a pre-trained model. During training, InCA uses a single forward pass to extract multiple activations, which are passed to external cross-attention adapters, trained anew and combined or selected for downstream tasks. We show that, even when selecting a single top-scoring adapter, InCA achieves performance comparable to full fine-tuning, at a cost comparable to fine-tuning just the last layer. For example, with a cross-attention probe 1.3% the size of a pre-trained ViT-L/16 model, we achieve performance within 0.2% of the full fine-tuning paragon at 51% training cost of the baseline, on average across 11 downstream classification tasks. Unlike other forms of efficient adaptation, InCA does not require backpropagating through the pre-trained model, thus leaving its execution unaltered at both training and inference. The versatility of InCA is best illustrated in fine-grained tasks, which may require accessing information absent in the last layer but accessible in intermediate layer activations. Since the backbone is fixed, InCA allows parallel ensembling as well as parallel execution of multiple tasks. InCA achieves state-of-the-art performance in the ImageNet-to-Sketch multi-task benchmark.

updated: Tue Mar 07 2023 18:12:24 GMT+0000 (UTC)

published: Tue Mar 07 2023 18:12:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト