LGD: Label-guided Self-distillation for Object Detection

Peizhen Zhang; Zijian Kang; Tong Yang; Xiangyu Zhang; Nanning Zheng; Jian Sun

LGD：オブジェクト検出のためのラベルガイド付き自己蒸留

この論文では、LGD（Label-Guided self-Distillation）と呼ばれる、一般的なオブジェクト検出のための最初の自己蒸留フレームワークを提案します。以前の研究は、蒸留のための有益な知識を提供するために強力な事前訓練された教師に依存しています。ただし、これは実際のシナリオでは利用できない可能性があります。代わりに、オブジェクト間の相互関係および内部関係モデリングによって有益な知識を生成し、学生の表現と通常のラベルのみを必要とします。詳細には、私たちのフレームワークには、スパースなラベル外観エンコーディング、オブジェクト間関係の適応、およびオブジェクト内の知識マッピングが含まれ、有益な知識を取得します。 LGDのモジュールは、学生検出器を使用してエンドツーエンドでトレーニングされ、推論で破棄されます。経験的に、LGDは、さまざまな検出器、データセット、およびインスタンスのセグメンテーションなどの広範なタスクで適切な結果を取得します。たとえば、MS-COCOデータセットでは、LGDはResNet-50を使用してRetinaNetを2倍のシングルスケールトレーニングで36.2％から39.0％mAP（+ 2.8％）に改善します。 2倍のマルチスケールトレーニング（46.1％）でResNeXt-101 DCN v2を使用したFCOSのようなはるかに強力な検出器の場合、LGDは47.9％（+ 1.8％）を達成します。 CrowdHumanデータセットでの歩行者検出の場合、LGDはResNet-50を使用した高速R-CNNのmMRを2.3％向上させます。 LGDは、従来の教師ベースの方法FGFIと比較して、事前にトレーニングを受けた教師を必要とせずにパフォーマンスが向上するだけでなく、本来の学生の学習を超えてトレーニングコストが51％低くなります。

In this paper, we propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation). Previous studies rely on a strong pretrained teacher to provide instructive knowledge for distillation. However, this could be unavailable in real-world scenarios. Instead, we generate an instructive knowledge by inter-and-intra relation modeling among objects, requiring only student representations and regular labels. In detail, our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge. Modules in LGD are trained end-to-end with student detector and are discarded in inference. Empirically, LGD obtains decent results on various detectors, datasets, and extensive task like instance segmentation. For example in MS-COCO dataset, LGD improves RetinaNet with ResNet-50 under 2x single-scale training from 36.2% to 39.0% mAP (+ 2.8%). For much stronger detectors like FCOS with ResNeXt-101 DCN v2 under 2x multi-scale training (46.1%), LGD achieves 47.9% (+ 1.8%). For pedestrian detection in CrowdHuman dataset, LGD boosts mMR by 2.3% for Faster R-CNN with ResNet-50. Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also with 51% lower training cost beyond inherent student learning.

updated: Thu Sep 23 2021 16:55:01 GMT+0000 (UTC)

published: Thu Sep 23 2021 16:55:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト