Fine-grained Retrieval Prompt Tuning

Shijie Wang; Jianlong Chang; Zhihui Wang; Haojie Li; Wanli Ouyang; Qi Tian

きめ細かな検索プロンプトのチューニング

細粒度オブジェクト検索は、識別表現を学習して、視覚的に類似したオブジェクトを検索することを目的としています。ただし、既存の最高のパフォーマンスを発揮する作品は、通常、セマンティック埋め込みスペースにペアワイズ類似性を課すか、ローカリゼーションサブネットワークを設計して、限られたデータシナリオでモデル全体を継続的に微調整するため、次善のソリューションに収束します。このホワイトペーパーでは、Fine-grained Retrieval Prompt Tuning (FRPT) を開発します。FRPT は、凍結された事前トレーニング済みモデルを操作して、サンプルプロンプトと機能適応の観点から、きめの細かい検索タスクを実行します。具体的には、FRPT は、モデル全体を微調整するのではなく、プロンプトと適応でより少ないパラメーターを学習するだけでよく、モデル全体を微調整することによって引き起こされる次善のソリューションへの収束の問題を解決します。技術的には、識別摂動プロンプト (DPP) が導入され、サンプルプロンプトプロセスと見なされます。これは、コンテンツを意識した不均一なサンプリング操作を介して、カテゴリ予測に寄与するいくつかの識別要素を増幅し、誇張することさえあります。このようにして、DPP は摂動プロンプトによって支援されるきめの細かい検索タスクを、元の事前トレーニング中に解決されたタスクに近づけることができます。これにより、入力サンプルから抽出された表現の一般化と識別が維持されます。さらに、カテゴリ固有の認識ヘッドが提案され、機能適応と見なされます。これは、カテゴリガイド付きインスタンス正規化を使用して、事前トレーニング済みモデルによって抽出された機能の種の不一致を取り除きます。したがって、最適化された機能にはサブカテゴリ間の不一致のみが含まれます。広範な実験により、学習可能なパラメーターが少ないFRPTが、広く使用されている3つのきめの細かいデータセットで最先端のパフォーマンスを達成することが実証されています。

Fine-grained object retrieval aims to learn discriminative representation to retrieve visually similar objects. However, existing top-performing works usually impose pairwise similarities on the semantic embedding spaces or design a localization sub-network to continually fine-tune the entire model in limited data scenarios, thus resulting in convergence to suboptimal solutions. In this paper, we develop Fine-grained Retrieval Prompt Tuning (FRPT), which steers a frozen pre-trained model to perform the fine-grained retrieval task from the perspectives of sample prompting and feature adaptation. Specifically, FRPT only needs to learn fewer parameters in the prompt and adaptation instead of fine-tuning the entire model, thus solving the issue of convergence to suboptimal solutions caused by fine-tuning the entire model. Technically, a discriminative perturbation prompt (DPP) is introduced and deemed as a sample prompting process, which amplifies and even exaggerates some discriminative elements contributing to category prediction via a content-aware inhomogeneous sampling operation. In this way, DPP can make the fine-grained retrieval task aided by the perturbation prompts close to the solved task during the original pre-training. Thereby, it preserves the generalization and discrimination of representation extracted from input samples. Besides, a category-specific awareness head is proposed and regarded as feature adaptation, which removes the species discrepancies in features extracted by the pre-trained model using category-guided instance normalization. And thus, it makes the optimized features only include the discrepancies among subcategories. Extensive experiments demonstrate that our FRPT with fewer learnable parameters achieves the state-of-the-art performance on three widely-used fine-grained datasets.

updated: Mon Mar 06 2023 09:45:11 GMT+0000 (UTC)

published: Fri Jul 29 2022 04:10:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト