Information Theoretic Representation Distillation

Roy Miles; Adrián López Rodríguez; Krystian Mikolajczyk

情報理論的表現蒸留

知識蒸留の経験的な成功にもかかわらず、計算上安価な実装に自然につながる可能性のある理論的基盤がまだ不足しています。この懸念に対処するために、最近提案されたエントロピーのような関数を使用して、情報理論と知識蒸留の間の代替接続を構築します。そうすることで、生徒と教師の表現の間の相関と相互情報量を最大化することを目的とした2つの明確な補完的損失を導入します。私たちの方法は、知識の蒸留とクロスモデル転送タスクで最先端の競争力のあるパフォーマンスを実現し、密接に関連して同様に実行するアプローチよりも大幅に少ないトレーニングオーバーヘッドを負担します。さらに、バイナリ蒸留タスクでのメソッドの有効性を示します。これにより、バイナリ量子化の新しい最先端技術に光を当てることができます。コード、評価プロトコル、およびトレーニング済みモデルは公開されます。

Despite the empirical success of knowledge distillation, there still lacks a theoretical foundation that can naturally lead to computationally inexpensive implementations. To address this concern, we forge an alternative connection between information theory and knowledge distillation using a recently proposed entropy-like functional. In doing so, we introduce two distinct complementary losses which aim to maximise the correlation and mutual information between the student and teacher representations. Our method achieves competitive performance to state-of-the-art on the knowledge distillation and cross-model transfer tasks, while incurring significantly less training overheads than closely related and similarly performing approaches. We further demonstrate the effectiveness of our method on a binary distillation task, whereby we shed light to a new state-of-the-art for binary quantisation. The code, evaluation protocols, and trained models will be publicly available.

updated: Wed Dec 01 2021 12:39:50 GMT+0000 (UTC)

published: Wed Dec 01 2021 12:39:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト