Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models

Sabeesh Ethiraj; Bharath Kumar Bolla

効率的なCNNSのトレーニング：ニューラルネットワークの要点を微調整して、より軽く、より速く、より堅牢なモデルを実現

ディープラーニングは、コンピュータービジョン、自然言語理解、音声認識、情報検索などの分野に革命をもたらしました。過去10年間で多くの技術が進化し、モデルがより軽く、より速く、より堅牢になり、より一般化されました。ただし、多くの深層学習の実践者は、Imagenet、MS-COCO、IMDB-Wikiデータセット、Kinetics-700などの標準データセットで主にトレーニングされた事前トレーニング済みのモデルとアーキテクチャを使い続けており、アーキテクチャを最初から再設計することを躊躇するか、気づいていません。パフォーマンスの向上につながります。このシナリオでは、モバイル、エッジ、フォグなどのさまざまなデバイスに適さない非効率的なモデルが発生します。さらに、これらの従来のトレーニング方法は、多くの計算能力を消費するため、懸念されます。このホワイトペーパーでは、アーキテクチャの効率（グローバル平均プーリング、深さ方向の畳み込みとスクイーズと励起、ブラープール）、学習率（循環学習率）、データ拡張（ミックスアップ、カットアウト）、ラベル操作（ラベルの平滑化）、重み空間の操作（確率的重みの平均化）、およびオプティマイザー（シャープネスを意識した最小化）。トレーニングパラメータの数を順次減らし、上記の手法を使用することにより、効率的な深層畳み込みネットワークを段階的に構築する方法を示します。 CIFAR-10データセットでは、1500個のパラメーターを使用したMNISTデータで99.2％のSOTA精度を達成し、140Kをわずかに超えるパラメーターで86.01％の精度を達成しました。

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. Many techniques have evolved over the past decade that made models lighter, faster, and robust with better generalization. However, many deep learning practitioners persist with pre-trained models and architectures trained mostly on standard datasets such as Imagenet, MS-COCO, IMDB-Wiki Dataset, and Kinetics-700 and are either hesitant or unaware of redesigning the architecture from scratch that will lead to better performance. This scenario leads to inefficient models that are not suitable on various devices such as mobile, edge, and fog. In addition, these conventional training methods are of concern as they consume a lot of computing power. In this paper, we revisit various SOTA techniques that deal with architecture efficiency (Global Average Pooling, depth-wise convolutions & squeeze and excitation, Blurpool), learning rate (Cyclical Learning Rate), data augmentation (Mixup, Cutout), label manipulation (label smoothing), weight space manipulation (stochastic weight averaging), and optimizer (sharpness aware minimization). We demonstrate how an efficient deep convolution network can be built in a phased manner by sequentially reducing the number of training parameters and using the techniques mentioned above. We achieved a SOTA accuracy of 99.2% on MNIST data with just 1500 parameters and an accuracy of 86.01% with just over 140K parameters on the CIFAR-10 dataset.

updated: Mon May 23 2022 13:51:06 GMT+0000 (UTC)

published: Mon May 23 2022 13:51:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト