Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data

Ashraful Islam; Chun-Fu Chen; Rameswar Panda; Leonid Karlinsky; Rogerio Feris; Richard J. Radke

ラベルなしデータを使用したクロスドメイン少数ショット認識のための動的蒸留ネットワーク

数ショット学習の既存の作業のほとんどは、通常はターゲットデータセットと同じドメインからの大規模なベースデータセットでのネットワークのメタ学習に依存しています。ベースドメインとターゲットドメインの間に大きなシフトがあるクロスドメインの数ショット学習の問題に取り組みます。ラベルのないターゲットデータを使用したクロスドメインの数ショット認識の問題は、文献ではほとんど取り上げられていません。 STARTUPは、セルフトレーニングを使用してこの問題に取り組む最初の方法でした。ただし、ラベル付きベースデータセットで事前トレーニングされた固定教師を使用して、ラベルなしターゲットサンプルのソフトラベルを作成します。ベースデータセットとラベルなしデータセットは異なるドメインからのものであるため、固定の事前トレーニング済みモデルを使用してベースデータセットのクラスドメインにターゲット画像を投影することは最適ではない可能性があります。小説/ベースデータセットからのラベルなし画像を容易にするために、単純な動的蒸留ベースのアプローチを提案します。教師ネットワークからのラベルなし画像の弱く拡張されたバージョンから予測を計算し、それを学生ネットワークからの同じ画像の強く拡張されたバージョンと照合することによって、一貫性の正則化を課します。教師ネットワークのパラメータは、学生ネットワークのパラメータの指数移動平均として更新されます。提案されたネットワークは、事前トレーニング段階でターゲット固有のクラスでトレーニングされていなくても、ターゲットドメインに簡単に適応できる表現を学習することを示します。私たちのモデルは、BSCD-FSLベンチマークで現在の最先端の方法を1ショットで4.4％、5ショット分類で3.6％優れており、従来のドメイン内の数ショット学習タスクで競争力のあるパフォーマンスを示しています。

Most existing works in few-shot learning rely on meta-learning the network on a large base dataset which is typically from the same domain as the target dataset. We tackle the problem of cross-domain few-shot learning where there is a large shift between the base and target domain. The problem of cross-domain few-shot recognition with unlabeled target data is largely unaddressed in the literature. STARTUP was the first method that tackles this problem using self-training. However, it uses a fixed teacher pretrained on a labeled base dataset to create soft labels for the unlabeled target samples. As the base dataset and unlabeled dataset are from different domains, projecting the target images in the class-domain of the base dataset with a fixed pretrained model might be sub-optimal. We propose a simple dynamic distillation-based approach to facilitate unlabeled images from the novel/base dataset. We impose consistency regularization by calculating predictions from the weakly-augmented versions of the unlabeled images from a teacher network and matching it with the strongly augmented versions of the same images from a student network. The parameters of the teacher network are updated as exponential moving average of the parameters of the student network. We show that the proposed network learns representation that can be easily adapted to the target domain even though it has not been trained with target-specific classes during the pretraining phase. Our model outperforms the current state-of-the art method by 4.4% for 1-shot and 3.6% for 5-shot classification in the BSCD-FSL benchmark, and also shows competitive performance on traditional in-domain few-shot learning task.

updated: Mon Nov 01 2021 04:28:04 GMT+0000 (UTC)

published: Mon Jun 14 2021 23:44:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト