Training convolutional neural networks with cheap convolutions and   online distillation

Jiao Xie; Shaohui Lin; Yichen Zhang; Linkai Luo

安価な畳み込みとオンライン蒸留による畳み込みニューラルネットワークのトレーニング

Training convolutional neural networks with cheap convolutions and online distillation

畳み込みニューラルネットワーク（CNN）での大量のメモリと計算の消費は、リソースが制限されたシステムに展開するための主な障壁の1つです。この目的のために、最も安価な畳み込み（たとえば、グループ畳み込み、深さ方向畳み込み、シフト畳み込み）は、メモリと計算の削減に最近使用されていますが、特定のアーキテクチャ設計が使用されています。さらに、標準の畳み込みをこれらの安価な畳み込みに直接置き換えることにより、圧縮ネットワークの識別性が低くなります。この論文では、知識の蒸留を使用して、コンパクトな学生ネットワークのパフォーマンスを安価な畳み込みで改善することを提案します。この場合、教師は標準的な畳み込みを備えたネットワークであり、学生は複雑な再設計なしで教師のアーキテクチャを単純に変換したものです。特に、事前学習なしに教師ネットワークをオンラインで構築し、教師と学生のネットワーク間で相互学習を実施して、学生モデルのパフォーマンスを向上させる新しいオンライン蒸留方法を提案します。広範な実験により、提案されたアプローチは、最先端のCNN圧縮と比較して、CIFAR-10 / 100およびImageNet ILSVRC 2012を含むさまざまなデータセットの最先端CNNのメモリおよび計算オーバーヘッドを同時に削減する優れたパフォーマンスを実現する加速方法。コードはhttps://github.com/EthanZhangYC/OD-cheap-convolutionで公開されています。

The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise convolution, and shift convolution) have recently been used for memory and computation reduction but with the specific architecture designing. Furthermore, it results in a low discriminability of the compressed networks by directly replacing the standard convolution with these cheap ones. In this paper, we propose to use knowledge distillation to improve the performance of the compact student networks with cheap convolutions. In our case, the teacher is a network with the standard convolution, while the student is a simple transformation of the teacher architecture without complicated redesigning. In particular, we propose a novel online distillation method, which online constructs the teacher network without pre-training and conducts mutual learning between the teacher and student network, to improve the performance of the student model. Extensive experiments demonstrate that the proposed approach achieves superior performance to simultaneously reduce memory and computation overhead of cutting-edge CNNs on different datasets, including CIFAR-10/100 and ImageNet ILSVRC 2012, compared to the state-of-the-art CNN compression and acceleration methods. The codes are publicly available at https://github.com/EthanZhangYC/OD-cheap-convolution.

updated: Thu Oct 10 2019 07:47:43 GMT+0000 (UTC)

published: Sat Sep 28 2019 10:16:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト