Information Theoretic Representation Distillation

Roy Miles; Adrian Lopez Rodriguez; Krystian Mikolajczyk

情報理論表現蒸留

知識の蒸留の経験的な成功にもかかわらず、現在の最先端の方法はトレーニングに計算コストがかかるため、実際に採用することは困難です。この問題に対処するために、安価なエントロピーのような推定量に触発された 2 つの異なる補完的な損失を導入します。これらの損失は、生徒と教師の表現の間の相関と相互情報を最大化することを目的としています。私たちの方法は、他のアプローチよりもトレーニングのオーバーヘッドが大幅に少なく、知識の蒸留とクロスモデル転送タスクで最先端の競争力のあるパフォーマンスを実現します。さらに、バイナリ蒸留タスクでのメソッドの有効性を実証します。これにより、バイナリ量子化の新しい最先端技術がもたらされ、完全精度モデルのパフォーマンスに近づきます。コード: www.github.com/roymiles/ITRD

Despite the empirical success of knowledge distillation, current state-of-the-art methods are computationally expensive to train, which makes them difficult to adopt in practice. To address this problem, we introduce two distinct complementary losses inspired by a cheap entropy-like estimator. These losses aim to maximise the correlation and mutual information between the student and teacher representations. Our method incurs significantly less training overheads than other approaches and achieves competitive performance to the state-of-the-art on the knowledge distillation and cross-model transfer tasks. We further demonstrate the effectiveness of our method on a binary distillation task, whereby it leads to a new state-of-the-art for binary quantisation and approaches the performance of a full precision model. Code: www.github.com/roymiles/ITRD

updated: Fri Oct 07 2022 16:05:00 GMT+0000 (UTC)

published: Wed Dec 01 2021 12:39:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト