A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

Ziwei Liu; Yongtao Wang; Xiaojie Chu

チャネルごとの変換による特徴抽出のためのシンプルで汎用的なフレームワーク

知識の蒸留は、模倣によって大規模な教師モデルから小規模な生徒モデルに知識を移すための一般的な手法です。ただし、教師と生徒の間で特徴マップを直接調整することによる蒸留は、生徒に過度に厳しい制約を課し、生徒モデルのパフォーマンスを低下させる可能性があります。上記の機能のずれの問題を軽減するために、既存の作業は主に、ピクセル単位の変換を使用して、教師と生徒の機能マップを空間的に位置合わせすることに焦点を当てています。この論文では、チャネルごとの次元に沿って教師と生徒の間の特徴マップを整列させることも、特徴の不整列の問題に対処するのに効果的であることを新たに発見しました。具体的には、学習可能な非線形チャネルごとの変換を提案して、生徒と教師のモデルの機能を揃えます。それに基づいて、蒸留損失とタスク固有の損失のバランスをとるためのハイパーパラメーターが1つだけの、機能蒸留のためのシンプルで一般的なフレームワークをさらに提案します。広範な実験結果は、画像分類 (ImageNet-1K 上の MobileNetV1 で +3.28% のトップ 1 精度)、オブジェクト検出 (ResNet50 ベースの Faster-RCNN で +3.9% bbox mAP) を含むさまざまなコンピュータービジョンタスクで、私たちの方法が大幅なパフォーマンス向上を達成することを示しています。 MS COCO で)、インスタンスセグメンテーション (ResNet50 ベースの Mask-RCNN で +2.8% Mask mAP)、セマンティックセグメンテーション (都市景観でのセマンティックセグメンテーションで ResNet18 ベースの PSPNet で +4.66% mIoU)、提案された方法。コードは公開されます。

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.

updated: Thu Mar 23 2023 12:13:29 GMT+0000 (UTC)

published: Thu Mar 23 2023 12:13:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト