Rectifying the Shortcut Learning of Background: Shared Object Concentration for Few-Shot Image Recognition

Xu Luo; Longhui Wei; Liangjian Wen; Jinrong Yang; Lingxi Xie; Zenglin Xu; Qi Tian

背景のショートカット学習の修正：少数ショット画像認識のための共有オブジェクト集中

Few-Shot画像分類は、大規模なデータセットから学習した事前トレーニング済みの知識を利用して、一連のダウンストリーム分類タスクに取り組むことを目的としています。通常、各タスクには、まったく新しいカテゴリのトレーニング例がいくつか含まれています。これには、事前トレーニングモデルが十分に一般化可能な知識に焦点を当てる必要がありますが、ドメイン固有の情報は無視します。このホワイトペーパーでは、画像の背景がドメイン固有の知識のソースとして機能することを確認します。これは、モデルがソースデータセットで学習するためのショートカットですが、新しいクラスに適応する場合は有害です。モデルがこのショートカット知識を学習するのを防ぐために、事前トレーニングと評価の両方の段階で前景オブジェクトを自動的に把握する、新しいFew-ShotLearningフレームワークであるCOSOCを提案します。 COSOCは、同じクラス内の異なる画像からの前景オブジェクトが背景よりも類似したパターンを共有するという観察によって動機付けられた2段階のアルゴリズムです。事前トレーニング段階では、クラスごとに、ランダムにトリミングされた画像パッチの対照的な事前トレーニングされた特徴をクラスター化し、前景オブジェクトのみを含む作物を単一のクラスターで識別できるようにします。次に、融合サンプリング戦略によって、事前トレーニングモデルに検出された前景オブジェクトに焦点を合わせるように強制します。評価段階では、数ショットのタスクの各トレーニングクラスの画像の中から、共有コンテンツを探し、背景を除外します。各クラスの認識された前景オブジェクトは、テスト画像の前景と一致させるために使用されます。 2つのベンチマークで帰納的FSLタスクに合わせて調整された広範な実験は、私たちの方法の最先端のパフォーマンスを示しています。

Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks. Typically, each task involves only few training examples from brand-new categories. This requires the pretraining models to focus on well-generalizable knowledge, but ignore domain-specific information. In this paper, we observe that image background serves as a source of domain-specific knowledge, which is a shortcut for models to learn in the source dataset, but is harmful when adapting to brand-new classes. To prevent the model from learning this shortcut knowledge, we propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage. COSOC is a two-stage algorithm motivated by the observation that foreground objects from different images within the same class share more similar patterns than backgrounds. At the pretraining stage, for each class, we cluster contrastive-pretrained features of randomly cropped image patches, such that crops containing only foreground objects can be identified by a single cluster. We then force the pretraining model to focus on found foreground objects by a fusion sampling strategy; at the evaluation stage, among images in each training class of any few-shot task, we seek for shared contents and filter out background. The recognized foreground objects of each class are used to match foreground of testing images. Extensive experiments tailored to inductive FSL tasks on two benchmarks demonstrate the state-of-the-art performance of our method.

updated: Fri Jul 16 2021 07:46:41 GMT+0000 (UTC)

published: Fri Jul 16 2021 07:46:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト