Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks

Cong Fang; Hangfeng He; Qi Long; Weijie J Su

レイヤーピールモデル：十分に訓練されたディープニューラルネットワークの理解に向けて

この論文では、十分に長い時間トレーニングされたディープニューラルネットワークをよりよく理解するために、非凸でありながら分析的に扱いやすい最適化プログラムであるレイヤーピールモデルを紹介します。名前が示すように、この新しいモデルは、ニューラルネットワークの残りの部分から最上層を分離し、2つの部分に別々に特定の制約を課すことによって導出されます。レイヤーピールモデルは、単純ではありますが、十分にトレーニングされたニューラルネットワークの多くの特性を継承し、それによって深層学習トレーニングの一般的な経験的パターンを説明および予測するための効果的なツールを提供することを示します。まず、クラスバランスの取れたデータセットで作業する場合、このモデルのソリューションがシンプレックス等角タイトフレームを形成することを証明します。これは、深層学習トレーニングで最近発見された神経崩壊の現象を部分的に説明しています[PHD20]。さらに、不均衡なケースに移ると、レイヤーピールモデルの分析により、マイノリティ崩壊と呼ばれるこれまで知られていなかった現象が明らかになります。これは、マイノリティクラスでの深層学習モデルのパフォーマンスを根本的に制限します。さらに、レイヤーピールモデルを使用して、マイノリティの崩壊を軽減する方法についての洞察を得ます。興味深いことに、この現象は、計算実験で確認される前に、レイヤーピールモデルによって最初に予測されます。

In this paper, we introduce the Layer-Peeled Model, a nonconvex yet analytically tractable optimization program, in a quest to better understand deep neural networks that are trained for a sufficiently long time. As the name suggests, this new model is derived by isolating the topmost layer from the remainder of the neural network, followed by imposing certain constraints separately on the two parts. We demonstrate that the Layer-Peeled Model, albeit simple, inherits many characteristics of well-trained neural networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep learning training. First, when working on class-balanced datasets, we prove that any solution to this model forms a simplex equiangular tight frame, which in part explains the recently discovered phenomenon of neural collapse in deep learning training [PHD20]. Moreover, when moving to the imbalanced case, our analysis of the Layer-Peeled Model reveals a hitherto unknown phenomenon that we term Minority Collapse, which fundamentally limits the performance of deep learning models on the minority classes. In addition, we use the Layer-Peeled Model to gain insights into how to mitigate Minority Collapse. Interestingly, this phenomenon is first predicted by the Layer-Peeled Model before its confirmation by our computational experiments.

updated: Fri Jan 29 2021 17:37:17 GMT+0000 (UTC)

published: Fri Jan 29 2021 17:37:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト