Enhanced Convolutional Neural Tangent Kernels

Zhiyuan Li; Ruosong Wang; Dingli Yu; Simon S. Du; Wei Hu; Ruslan Salakhutdinov; Sanjeev Arora

強化された畳み込みニューラルタンジェントカーネル

最近の研究では、ℓ_2損失を伴うトレーニングでは、幅（畳み込み層のチャネル数）が無限になる畳み込みニューラルネットワーク（CNN）は、最後の層のみがCNN Gaussian Process kernel（CNN-GP）に関する回帰に対応することを示していますは訓練されており、すべての層が訓練されている場合、Convolutional Neural Tangent Kernel（CNTK）に関する回帰に対応しています。 CNTKを計算するための正確なアルゴリズム（Arora et al。、2019）により、CIFAR-10でのCNTKの分類精度は、対応するCNNアーキテクチャの分類精度（最高の数値は約78％）の分類精度の6-7％以内であるという結果が得られました。固定カーネルの興味深いパフォーマンスです。ここでは、2つのアイデアを使用して、これらのカーネルのパフォーマンスを大幅に向上させる方法を示します。（1）Local Average Pooling（LAP）と呼ばれる新しい操作を使用してカーネルを変更します。これは、カーネルの効率的な計算可能性を保持し、ピクセルシフトを使用した標準的なデータ拡張の精神を継承します。以前の論文では、カーネル回帰の2次トレーニングコストのために、単純なデータ増大を組み込むことができませんでした。このアイデアは、CNN-GPとCNTKで示したグローバル平均プーリング（GAP）に触発されたものであり、完全な翻訳データの増強に相当します。（2）Coatesらによって提案された前処理技術を使用して入力画像を表現する。（2011）、ランダム画像パッチで構成される単一の畳み込み層を使用します。 CIFAR-10では、結果として得られるカーネルであるLAPと水平フリップデータ拡張を備えたCNN-GPが89％の精度を達成し、AlexNetのパフォーマンスと一致します（Krizhevsky et al。、2012）。これは、訓練されたニューラルネットワークではない分類器について、このような最良の結果であることに注意してください。 Fashion-MNISTでも同様の改善が得られています。

Recent research shows that for training with ℓ_2 loss, convolutional neural networks (CNNs) whose width (number of channels in convolutional layers) goes to infinity correspond to regression with respect to the CNN Gaussian Process kernel (CNN-GP) if only the last layer is trained, and correspond to regression with respect to the Convolutional Neural Tangent Kernel (CNTK) if all layers are trained. An exact algorithm to compute CNTK (Arora et al., 2019) yielded the finding that classification accuracy of CNTK on CIFAR-10 is within 6-7% of that of that of the corresponding CNN architecture (best figure being around 78%) which is interesting performance for a fixed kernel. Here we show how to significantly enhance the performance of these kernels using two ideas. (1) Modifying the kernel using a new operation called Local Average Pooling (LAP) which preserves efficient computability of the kernel and inherits the spirit of standard data augmentation using pixel shifts. Earlier papers were unable to incorporate naive data augmentation because of the quadratic training cost of kernel regression. This idea is inspired by Global Average Pooling (GAP), which we show for CNN-GP and CNTK is equivalent to full translation data augmentation. (2) Representing the input image using a pre-processing technique proposed by Coates et al. (2011), which uses a single convolutional layer composed of random image patches. On CIFAR-10, the resulting kernel, CNN-GP with LAP and horizontal flip data augmentation, achieves 89% accuracy, matching the performance of AlexNet (Krizhevsky et al., 2012). Note that this is the best such result we know of for a classifier that is not a trained neural network. Similar improvements are obtained for Fashion-MNIST.

updated: Sun Nov 03 2019 02:24:39 GMT+0000 (UTC)

published: Sun Nov 03 2019 02:24:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト