Efficient shallow learning as an alternative to deep learning

Yuval Meir; Ofek Tevet; Yarden Tzach; Shiri Hodassman; Ronit D. Gross; Ido Kanter

深層学習の代替としての効率的な浅層学習

複雑な分類タスクを実現するには、人間の脳の現実とはかけ離れた、数十または数百の畳み込みおよび完全に接続された隠れ層で構成される深層学習 (DL) アーキテクチャのトレーニングが必要です。 DL の理論的根拠によれば、最初の畳み込み層は、入力のクラスを確実に特徴付けるまで、入力の局所的なパターンと次の層の大規模なパターンを明らかにします。ここでは、最初の畳み込み層と 2 番目の畳み込み層の深さの比率が固定されている場合、5 つの層のみで構成される一般化された浅い LeNet アーキテクチャのエラー率が、最初の畳み込み層のフィルターの数に応じて累乗則として減衰することを示します。層。このべき法則の外挿は、一般化された LeNet が、以前に DL アーキテクチャを使用して CIFAR-10 データベースで得られた小さなエラー率を達成できることを示しています。同様の指数を持つべき法則も、一般化された VGG-16 アーキテクチャの特徴です。ただし、これにより、LeNet に関して特定のエラー率を達成するために必要な操作の数が大幅に増加します。このべき法則現象は、さまざまな一般化された LeNet および VGG-16 アーキテクチャを支配し、その普遍的な動作を示唆し、機械学習アーキテクチャ間の定量的な階層的な時空間の複雑さを示唆しています。さらに、畳み込み層に沿った保存則 (サイズの平方根に深さを掛けたもの) は、エラー率を漸近的に最小化することがわかっています。この研究で実証された効率的な浅い学習は、さまざまなデータベースとアーキテクチャを使用してさらに定量的に検討し、将来の専用ハードウェア開発を使用して実装を加速する必要があります。

The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time-space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments.

updated: Wed Nov 23 2022 11:38:21 GMT+0000 (UTC)

published: Tue Nov 15 2022 10:10:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト