Redundant representations help generalization in wide neural networks

Diego Doimo; Aldo Glielmo; Sebastian Goldt; Alessandro Laio

冗長な表現は、幅広いニューラルネットワークでの一般化に役立ちます

ディープニューラルネットワーク（DNN）は、従来の偏りと分散のトレードオフに逆らいます。トレーニングデータを補間するパラメーターをDNNに追加すると、通常、一般化のパフォーマンスが向上します。深いネットワークにおけるこの「良性の過剰適合」の背後にあるメカニズムを説明することは、依然として未解決の課題です。ここでは、さまざまな最先端の畳み込みニューラルネットワークの最後の隠れ層表現を研究し、最後の隠れ表現が十分に広い場合、そのニューロンは同一の情報を運ぶグループに分割され、互いに異なる傾向があることを発見します統計的に独立したノイズによってのみ。このようなグループの数は、レイヤーの幅に比例して増加しますが、幅が臨界値を超えている場合に限ります。トレーニングプロセスが補間に達し、トレーニングエラーがゼロの場合にのみ冗長ニューロンが現れることを示します。

Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise. The number of such groups increases linearly with the width of the layer, but only if the width is above a critical value. We show that redundant neurons appear only when the training process reaches interpolation and the training error is zero.

updated: Sat Apr 29 2023 09:39:14 GMT+0000 (UTC)

published: Mon Jun 07 2021 10:18:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト