Cross-Domain Adaptive Teacher for Object Detection

Yu-Jhe Li; Xiaoliang Dai; Chih-Yao Ma; Yen-Cheng Liu; Kan Chen; Bichen Wu; Zijian He; Kris Kitani; Peter Vajda

オブジェクト検出のためのクロスドメイン適応教師

オブジェクト検出におけるドメイン適応のタスクに対処します。ここでは、注釈付きのドメイン（ソース）と注釈なしの対象ドメイン（ターゲット）の間にドメインギャップがあります。効果的な半教師あり学習方法として、教師と生徒のフレームワーク（生徒モデルは教師モデルからの疑似ラベルによって監視されます）も、クロスドメインオブジェクト検出で大きな精度の向上をもたらしました。ただし、ドメインシフトの影響を受け、多くの低品質の疑似ラベル（誤検知など）が生成されるため、パフォーマンスが最適化されません。この問題を軽減するために、ドメインの敵対的学習と弱く強いデータ拡張を活用してドメインのギャップに対処する、Adaptive Teacher（AT）という名前の教師と生徒のフレームワークを提案します。具体的には、学生モデルで機能レベルの敵対的トレーニングを採用し、ソースドメインとターゲットドメインから派生した機能が同様の分布を共有できるようにします。このプロセスにより、学生モデルがドメイン不変の機能を生成することが保証されます。さらに、教師モデル（ターゲットドメインからデータを取得）と学生モデル（両方のドメインからデータを取得）の間に、弱強の拡張と相互学習を適用します。これにより、教師モデルは、ソースドメインに偏ることなく、学生モデルから知識を学習できます。 ATは、既存のアプローチやOracle（完全に監視された）モデルよりも大幅に優れていることを示しています。たとえば、Foggy Cityscape（Clipart1K）で50.9％（49.3％）のmAPを達成しました。これは、以前の最先端技術とOracleよりもそれぞれ9.2％（5.2％）と8.2％（11.0％）高くなっています。

We address the task of domain adaptation in object detection, where there is a domain gap between a domain with annotations (source) and a domain of interest without annotations (target). As an effective semi-supervised learning method, the teacher-student framework (a student model is supervised by the pseudo labels from a teacher model) has also yielded a large accuracy gain in cross-domain object detection. However, it suffers from the domain shift and generates many low-quality pseudo labels (e.g., false positives), which leads to sub-optimal performance. To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap. Specifically, we employ feature-level adversarial training in the student model, allowing features derived from the source and target domains to share similar distributions. This process ensures the student model produces domain-invariant features. Furthermore, we apply weak-strong augmentation and mutual learning between the teacher model (taking data from the target domain) and the student model (taking data from both domains). This enables the teacher model to learn the knowledge from the student model without being biased to the source domain. We show that AT demonstrates superiority over existing approaches and even Oracle (fully-supervised) models by a large margin. For example, we achieve 50.9% (49.3%) mAP on Foggy Cityscape (Clipart1K), which is 9.2% (5.2%) and 8.2% (11.0%) higher than previous state-of-the-art and Oracle, respectively.

updated: Wed May 11 2022 23:12:59 GMT+0000 (UTC)

published: Thu Nov 25 2021 18:50:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト