DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

Jiahao Chang; Shuo Wang; Guangkai Xu; Zehui Chen; Chenhongyi Yang; Feng Zhao

DETRDistill: DETR ファミリーのための普遍的な知識抽出フレームワーク

Transformer ベースの検出器 (DETR) は、スパーストレーニングパラダイムと後処理操作の削除により大きな注目を集めていますが、巨大なモデルは計算に時間がかかり、実際のアプリケーションに展開するのが難しい場合があります。この問題に取り組むために、知識の蒸留 (KD) を使用して、普遍的な教師と生徒の学習フレームワークを構築することにより、巨大なモデルを圧縮できます。従来の CNN 検出器とは異なり、特徴マップを介して蒸留ターゲットを自然に整列させることができますが、DETR は物体検出を一連の予測問題と見なし、蒸留中に教師と生徒の関係が不明確になります。この論文では、DETRファミリー専用の新しい知識蒸留であるDETRDistillを提案します。最初に、漸進的な段階ごとのインスタンス蒸留を使用して、スパースマッチングパラダイムを調べます。さまざまなDETRで採用されている多様な注意メカニズムを考慮して、従来の機能模倣の非効率性を克服するために、注意にとらわれない機能蒸留モジュールを提案します。最後に、教師からの中間成果物を十分に活用するために、教師支援の割り当て蒸留を導入します。これは、教師のオブジェクトクエリとグループの割り当て結果を追加のガイダンスと共に使用します。広範な実験により、推論段階で余分な消費を導入することなく、蒸留法がさまざまな競合DETRアプローチで大幅な改善を達成することが実証されています。私たちの知る限りでは、これは DETR スタイルの検出器の一般的な蒸留方法を調査する最初の体系的な研究です。

Transformer-based detectors (DETRs) have attracted great attention due to their sparse training paradigm and the removal of post-processing operations, but the huge model can be computationally time-consuming and difficult to be deployed in real-world applications. To tackle this problem, knowledge distillation (KD) can be employed to compress the huge model by constructing a universal teacher-student learning framework. Different from the traditional CNN detectors, where the distillation targets can be naturally aligned through the feature map, DETR regards object detection as a set prediction problem, leading to an unclear relationship between teacher and student during distillation. In this paper, we propose DETRDistill, a novel knowledge distillation dedicated to DETR-families. We first explore a sparse matching paradigm with progressive stage-by-stage instance distillation. Considering the diverse attention mechanisms adopted in different DETRs, we propose attention-agnostic feature distillation module to overcome the ineffectiveness of conventional feature imitation. Finally, to fully leverage the intermediate products from the teacher, we introduce teacher-assisted assignment distillation, which uses the teacher's object queries and assignment results for a group with additional guidance. Extensive experiments demonstrate that our distillation method achieves significant improvement on various competitive DETR approaches, without introducing extra consumption in the inference phase. To the best of our knowledge, this is the first systematic study to explore a general distillation method for DETR-style detectors.

updated: Mon Nov 21 2022 07:40:11 GMT+0000 (UTC)

published: Thu Nov 17 2022 13:35:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト