Visualizing the embedding space to explain the effect of knowledge distillation

Hyun Seung Lee; Christian Wallraven

知識蒸留の効果を説明するための埋め込みスペースの視覚化

最近の研究では、知識の蒸留がネットワークのサイズを縮小し、一般化を促進するのに効果的であることがわかっています。たとえば、事前にトレーニングされた大規模な教師ネットワークは、限られたラベル環境で最終的に教師を上回る学生モデルをブートストラップできることが示されました。これらの進歩にもかかわらず、なぜこの方法が機能するのか、つまり、結果として得られる学生モデルが「より優れている」のかはまだ比較的不明です。この問題に対処するために、ここでは、2つの非線形の低次元埋め込み方法（t-SNEとIVIS）を利用して、ネットワーク内の異なるレイヤーの表現空間を視覚化します。さまざまなアーキテクチャパラメータと蒸留方法を使用して、一連の広範な実験を実行します。結果として得られる視覚化とメトリックは、蒸留がネットワークをガイドして、蒸留されていないバージョンと比較して、以前のレイヤーですでに高い精度を実現するためのよりコンパクトな表現空間を見つけることを明確に示しています。

Recent research has found that knowledge distillation can be effective in reducing the size of a network and in increasing generalization. A pre-trained, large teacher network, for example, was shown to be able to bootstrap a student model that eventually outperforms the teacher in a limited label environment. Despite these advances, it still is relatively unclear why this method works, that is, what the resulting student model does 'better'. To address this issue, here, we utilize two non-linear, low-dimensional embedding methods (t-SNE and IVIS) to visualize representation spaces of different layers in a network. We perform a set of extensive experiments with different architecture parameters and distillation methods. The resulting visualizations and metrics clearly show that distillation guides the network to find a more compact representation space for higher accuracy already in earlier layers compared to its non-distilled version.

updated: Sat Oct 09 2021 07:04:26 GMT+0000 (UTC)

published: Sat Oct 09 2021 07:04:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト