Enhancing Few-shot Image Classification with Cosine Transformer

Quang-Huy Nguyen; Cuong Q. Nguyen; Dung D. Le; Hieu H. Pham; Minh N. Do

コサイン変換による少数ショット画像分類の強化

このペーパーでは、少数ショットの画像分類問題に対処します。少数ショット学習の顕著な制限の 1 つは、同じカテゴリを記述する際のバリエーションです。これにより、小さなラベル付きサポートと大きなラベルなしクエリセットの間に大きな違いが生じる可能性があります。私たちのアプローチは、2 つのセット間のリレーションヒートマップを取得して、後者のセットを変換設定の方法でラベル付けすることです。これは、スケーリングされた内積メカニズムでクロスアテンションを使用することで解決できます。ただし、埋め込みベクトルの 2 つの別個のセット間の大きさの違いは、出力アテンションマップに重大な影響を与え、モデルのパフォーマンスに影響を与える可能性があります。コサイン類似度を使用して注意メカニズムを改善することにより、この問題に取り組みます。具体的には、FS-CT (Few-shot Cosine Transformer) を開発します。FS-CT (Few-shot Cosine Transformer) は、プロトタイプの埋め込みと変換ベースのフレームワークに基づく少数ショット画像分類手法です。提案されたコサインアテンションは、3 つの少数ショットデータセット mini-ImageNet、CUB-200、および CIFAR-FS のさまざまなシナリオで、ベースラインスケーリングされたドット積アテンションと比較して、FS-CT パフォーマンスを精度でほぼ 5% から 20% 以上に大幅に改善します。 .さらに、注意モジュールに供給する前に、学習可能な重みを使用してカテゴリ表現のプロトタイプの埋め込みを強化します。提案されたメソッド FS-CT とコサインアテンションは実装が簡単で、幅広いアプリケーションに適用できます。コードは https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer で入手できます。

This paper addresses the few-shot image classification problem. One notable limitation of few-shot learning is the variation in describing the same category, which might result in a significant difference between small labeled support and large unlabeled query sets. Our approach is to obtain a relation heatmap between the two sets in order to label the latter one in a transductive setting manner. This can be solved by using cross-attention with the scaled dot-product mechanism. However, the magnitude differences between two separate sets of embedding vectors may cause a significant impact on the output attention map and affect model performance. We tackle this problem by improving the attention mechanism with cosine similarity. Specifically, we develop FS-CT (Few-shot Cosine Transformer), a few-shot image classification method based on prototypical embedding and transformer-based framework. The proposed Cosine attention improves FS-CT performances significantly from nearly 5% to over 20% in accuracy compared to the baseline scaled dot-product attention in various scenarios on three few-shot datasets mini-ImageNet, CUB-200, and CIFAR-FS. Additionally, we enhance the prototypical embedding for categorical representation with learnable weights before feeding them to the attention module. Our proposed method FS-CT along with the Cosine attention is simple to implement and can be applied for a wide range of applications. Our codes are available at https://github.com/vinuni-vishc/Few-Shot-Cosine-Transformer

updated: Sun Nov 13 2022 06:03:28 GMT+0000 (UTC)

published: Sun Nov 13 2022 06:03:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト