MUSE: Feature Self-Distillation with Mutual Information and Self-Information

Yu Gong; Ye Yu; Gaurav Mittal; Greg Mori; Mei Chen

MUSE：相互情報量と自己情報量による自己蒸留機能

深い畳み込みニューラルネットワーク（CNN）の機能間の依存関係を導入するための新しい情報理論的アプローチを提示します。 MUSEと呼ばれる提案手法の中心的なアイデアは、相互情報量と自己情報量を組み合わせて、CNNの異なる層から抽出されたすべての特徴の表現度を共同で改善することです。 MUSEの実現には、加法情報と乗法情報の2つのバリエーションがあります。重要なのは、MUSEが他の機能の不一致機能と比較して、依存関係を導入し、知識蒸留フレームワークのすべての機能の表現度を効果的に改善するためのより機能的なプロキシであることを主張し、経験的に実証することです。 MUSEは、さまざまな一般的なアーキテクチャよりも優れたパフォーマンスを実現し、自己蒸留とオンライン蒸留の不一致機能を備えており、オフライン蒸留の最先端の方法と競合して機能します。 MUSEは明らかに用途が広く、オブジェクト検出などの画像分類以外のタスクでCNNベースのモデルに簡単に拡張できます。

We present a novel information-theoretic approach to introduce dependency among features of a deep convolutional neural network (CNN). The core idea of our proposed method, called MUSE, is to combine MUtual information and SElf-information to jointly improve the expressivity of all features extracted from different layers in a CNN. We present two variants of the realization of MUSE -- Additive Information and Multiplicative Information. Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework. MUSE achieves superior performance over a variety of popular architectures and feature discrepancy functions for self-distillation and online distillation, and performs competitively with the state-of-the-art methods for offline distillation. MUSE is also demonstrably versatile that enables it to be easily extended to CNN-based models on tasks other than image classification such as object detection.

updated: Mon Oct 25 2021 02:36:25 GMT+0000 (UTC)

published: Mon Oct 25 2021 02:36:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト