Dual Discriminator Adversarial Distillation for Data-free Model Compression

Haoran Zhao; Xin Sun; Junyu Dong; Hui Yu; Huiyu Zhou

データフリーモデル圧縮のための二重弁別器敵対的蒸留

知識蒸留は、コンピュータビジョンタスク用のエッジデバイスに適切に適用できるポータブルで効率的なニューラルネットワークを作成するために広く使用されています。ただし、ほとんどすべての最高の知識蒸留方法は、元のトレーニングデータにアクセスする必要があります。このデータは通常、サイズが大きく、利用できないことがよくあります。この問題に取り組むために、この論文では、トレーニングデータやメタデータなしでニューラルネットワークを蒸留するDual Discriminator Adversarial Distillation（DDAD）という名前の新しいデータフリーアプローチを提案します。具体的には、ジェネレーターを使用して、元のトレーニングデータを模倣するデュアルディスクリミネーターの敵対的蒸留によってサンプルを作成します。ジェネレーターは、既存のバッチ正規化レイヤーで事前にトレーニングされた教師の固有の統計を使用するだけでなく、学生モデルから最大の不一致を取得します。次に、生成されたサンプルは、教師の監督下でコンパクトな学生ネットワークをトレーニングするために使用されます。提案された方法は、元のトレーニングデータを使用していなくても、教師ネットワークに非常に近い効率的な学生ネットワークを取得します。分類タスクのためのCIFAR-10、CIFAR-100、およびCaltech101データセットに対する提案されたアプローチの有効性を実証するために、広範な実験が実施されます。さらに、CamVidやNYUv2などのいくつかの公開データセットでのセマンティックセグメンテーションタスクにメソッドを拡張します。すべての実験は、私たちの方法がデータフリーの知識蒸留のすべてのベースラインを上回っていることを示しています。

Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to access the original training data, which usually has a huge size and is often unavailable. To tackle this problem, we propose a novel data-free approach in this paper, named Dual Discriminator Adversarial Distillation (DDAD) to distill a neural network without any training data or meta-data. To be specific, we use a generator to create samples through dual discriminator adversarial distillation, which mimics the original training data. The generator not only uses the pre-trained teacher's intrinsic statistics in existing batch normalization layers but also obtains the maximum discrepancy from the student model. Then the generated samples are used to train the compact student network under the supervision of the teacher. The proposed method obtains an efficient student network which closely approximates its teacher network, despite using no original training data. Extensive experiments are conducted to to demonstrate the effectiveness of the proposed approach on CIFAR-10, CIFAR-100 and Caltech101 datasets for classification tasks. Moreover, we extend our method to semantic segmentation tasks on several public datasets such as CamVid and NYUv2. All experiments show that our method outperforms all baselines for data-free knowledge distillation.

updated: Mon Apr 12 2021 12:01:45 GMT+0000 (UTC)

published: Mon Apr 12 2021 12:01:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト