Rethinking Generalization in Few-Shot Classification

Markus Hiller; Rongkai Ma; Mehrtash Harandi; Tom Drummond

少数ショット分類における一般化の再考

単一の画像レベルの注釈は、特に複雑な実世界のシーンが描かれている場合、画像のコンテンツの小さなサブセットのみを正しく記述します。これは多くの分類シナリオで受け入れられる可能性がありますが、クラスのセットがトレーニング時間とテスト時間の間で大幅に異なるアプリケーションにとっては重大な課題となります。このホワイトペーパーでは、数回の学習のコンテキストでの影響を詳しく見ていきます。入力サンプルをパッチに分割し、Vision Transformersを使用してこれらをエンコードすることで、画像全体で、それぞれのクラスに関係なく、ローカル領域間のセマンティック対応を確立できます。次に、手元のタスクに最も有益なパッチ埋め込みが、推論時にオンライン最適化を介してサポートセットの関数として決定され、さらに画像内の「最も重要なもの」の視覚的な解釈が可能になります。マスクされた画像モデリングを介した教師なしトレーニングの最近の進歩に基づいて、きめの細かいラベルの欠如を克服し、画像レベルの注釈の悪影響、別名監視の崩壊を回避しながら、データのより一般的な統計構造を学習します。実験結果は、私たちのアプローチの競争力を示しており、5ショットおよび1ショットのシナリオで人気のある4つのショット分類ベンチマークで新しい最先端の結果を達成しています。

Single image-level annotations only correctly describe an often small subset of an image's content, particularly when complex real-world scenes are depicted. While this might be acceptable in many classification scenarios, it poses a significant challenge for applications where the set of classes differs significantly between training and test time. In this paper, we take a closer look at the implications in the context of few-shot learning. Splitting the input samples into patches and encoding these via the help of Vision Transformers allows us to establish semantic correspondences between local regions across images and independent of their respective class. The most informative patch embeddings for the task at hand are then determined as a function of the support set via online optimization at inference time, additionally providing visual interpretability of `what matters most' in the image. We build on recent advances in unsupervised training of networks via masked image modelling to overcome the lack of fine-grained labels and learn the more general statistical structure of the data while avoiding negative image-level annotation influence, aka supervision collapse. Experimental results show the competitiveness of our approach, achieving new state-of-the-art results on four popular few-shot classification benchmarks for 5-shot and 1-shot scenarios.

updated: Wed Jun 15 2022 03:05:21 GMT+0000 (UTC)

published: Wed Jun 15 2022 03:05:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト