PURSUhInT: In Search of Informative Hint Points Based on Layer Clustering for Knowledge Distillation

Reyhan Kevser Keser; Aydin Ayanzadeh; Omid Abdollahi Aghdam; Caglar Kilcioglu; Behcet Ugur Toreyin; Nazim Kemal Ure

PURSUhInT：知識蒸留のためのレイヤークラスタリングに基づく有益なヒントポイントを求めて

ディープニューラルネットワークを圧縮するための新しい知識蒸留方法論を提案します。知識蒸留の最も効率的な方法の1つはヒント蒸留です。この場合、生徒モデルには教師モデルのいくつかの異なるレイヤーからの情報（ヒント）が注入されます。ヒントポイントを選択すると、圧縮パフォーマンスが大幅に変わる可能性がありますが、ブルートフォースハイパーパラメータ検索以外に、ヒントポイントを選択するための体系的なアプローチはありません。教師モデルのレイヤーがいくつかのメトリックに関してクラスター化され、クラスターの中心がヒントポイントとして使用される、クラスタリングベースのヒント選択方法論を提案します。提案されたアプローチは、ResNet-110ネットワークが教師モデルとして使用されたCIFAR-100データセットで検証されています。私たちの結果は、私たちのアルゴリズムによって選択されたヒントポイントが、同じ学生モデルとデータセットでの最先端の知識蒸留アルゴリズムに関して優れた圧縮パフォーマンスをもたらすことを示しています。

We propose a novel knowledge distillation methodology for compressing deep neural networks. One of the most efficient methods for knowledge distillation is hint distillation, where the student model is injected with information (hints) from several different layers of the teacher model. Although the selection of hint points can drastically alter the compression performance, there is no systematic approach for selecting them, other than brute-force hyper-parameter search. We propose a clustering based hint selection methodology, where the layers of teacher model are clustered with respect to several metrics and the cluster centers are used as the hint points. The proposed approach is validated in CIFAR-100 dataset, where ResNet-110 network was used as the teacher model. Our results show that hint points selected by our algorithm results in superior compression performance with respect to state-of-the-art knowledge distillation algorithms on the same student models and datasets.

updated: Fri Feb 26 2021 21:18:34 GMT+0000 (UTC)

published: Fri Feb 26 2021 21:18:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト