Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations

Wenjie Pei; Shuang Wu; Dianwen Mei; Fanglin Chen; Jiandong Tian; Guangming Lu

Bag-of-Visual-Words表現を使用した知識蒸留による少数のショットオブジェクトの検出

数ショットのオブジェクト検出のための微調整ベースの方法は目覚ましい進歩を遂げましたが、うまく対処されていない重要な課題は、基本クラスでの潜在的なクラス固有の過剰適合と新しいクラスでのサンプル固有の過剰適合です。この作業では、オブジェクト検出器の学習をガイドする新しい知識蒸留フレームワークを設計し、それによって基本クラスの事前トレーニング段階と新しいクラスの微調整段階の両方で過剰適合を抑制します。具体的には、まず、類似性に基づいて一般的な画像をエンコードするために使用される、限られたサイズの画像セットから視覚的な単語の代表的なバッグ（BoVW）を学習するための新しい位置認識バッグオブビジュアルワードモデルを提示します。学習した視覚的な言葉とイメージの間。次に、画像が2つの異なる特徴空間で一貫したBoVW表現を持つ必要があるという事実に基づいて、知識の蒸留を実行します。この目的のために、オブジェクト検出とは独立して特徴空間を事前に学習し、この空間でBoVWを使用して画像をエンコードします。得られた画像のBoVW表現は、オブジェクト検出器の学習をガイドするための蒸留知識と見なすことができます。同じ画像のオブジェクト検出器によって抽出された特徴は、蒸留知識と一貫したBoVW表現を導出することが期待されます。広範な実験により、私たちの方法の有効性が検証され、他の最先端の方法に対する優位性が実証されています。

While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. To be specific, we first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words (BoVW) from a limited size of image set, which is used to encode general images based on the similarities between the learned visual words and an image. Then we perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces. To this end, we pre-learn a feature space independently from the object detection, and encode images using BoVW in this space. The obtained BoVW representation for an image can be considered as distilled knowledge to guide the learning of object detector: the extracted features by the object detector for the same image are expected to derive the consistent BoVW representations with the distilled knowledge. Extensive experiments validate the effectiveness of our method and demonstrate the superiority over other state-of-the-art methods.

updated: Mon Jul 25 2022 10:40:40 GMT+0000 (UTC)

published: Mon Jul 25 2022 10:40:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト