Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation

Gaurav Kumar Nayak; Konda Reddy Mopuri; Anirban Chakraborty

データフリーの知識蒸留のための任意の転送セットの有効性

知識蒸留は、ディープニューラルネットワーク間で学習を転送するための効果的な方法です。通常、教師モデルのトレーニングに最初に使用されたデータセットは、生徒への知識の伝達を行うための「伝達セット」として選択されます。ただし、この元のトレーニングデータは、プライバシーや機密性の懸念から、常に自由に利用できるとは限りません。このようなシナリオでは、既存のアプローチは、元のトレーニングデータセットを表す合成セットを一度に1つのサンプルで繰り返し作成するか、生成モデルを学習してそのような転送セットを作成します。ただし、これらのアプローチはどちらも複雑な最適化（GANトレーニングまたは1つのサンプルを合成するためのいくつかのバックプロパゲーションステップ）を伴い、多くの場合、計算コストが高くなります。この論文では、簡単な代替手段として、ランダムノイズ、公開されている合成データセット、自然データセットなどの「任意の転送セット」の有効性を調査します。これらはすべて、視覚的または意味的に元のトレーニングデータセットとはまったく関係がありません。内容。 MNIST、FMNIST、CIFAR-10、CIFAR-100などの複数のベンチマークデータセットでの広範な実験を通じて、このデータセットが「ターゲットクラスのバランスが取れている」場合に、任意のデータを使用して知識の蒸留を行うことの驚くべき効果を発見し、検証します。この重要な観察結果は、データのない知識蒸留タスクのベースラインの設計につながる可能性があると考えています。

Knowledge Distillation is an effective method to transfer the learning across deep neural networks. Typically, the dataset originally used for training the Teacher model is chosen as the "Transfer Set" to conduct the knowledge transfer to the Student. However, this original training data may not always be freely available due to privacy or sensitivity concerns. In such scenarios, existing approaches either iteratively compose a synthetic set representative of the original training dataset, one sample at a time or learn a generative model to compose such a transfer set. However, both these approaches involve complex optimization (GAN training or several backpropagation steps to synthesize one sample) and are often computationally expensive. In this paper, as a simple alternative, we investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets, all of which are completely unrelated to the original training dataset in terms of their visual or semantic contents. Through extensive experiments on multiple benchmark datasets such as MNIST, FMNIST, CIFAR-10 and CIFAR-100, we discover and validate surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced". We believe that this important observation can potentially lead to designing baselines for the data-free knowledge distillation task.

updated: Wed Nov 18 2020 06:33:20 GMT+0000 (UTC)

published: Wed Nov 18 2020 06:33:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト