Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

Yibo Yang; Shixiang Chen; Xiangtai Li; Liang Xie; Zhouchen Lin; Dacheng Tao

不均衡な学習における神経崩壊の誘発: ディープニューラルネットワークの最後に学習可能な分類子が本当に必要なのか?

分類用の最新のディープニューラルネットワークは通常、表現用のバックボーンと各クラスのロジットを出力する線形分類器を共同で学習します。最近の研究では、バランスの取れたデータセットでのトレーニングの最終段階で、特徴のクラス内平均と分類子ベクトルがシンプレックス等角タイトフレーム (ETF) の頂点に収束する、ニューラルコラプスと呼ばれる現象が示されています。 ETF の幾何学的構造は、分類器内のすべてのクラスのペアごとの角度を最大限に分離するため、疑問を提起するのは自然なことです。最適な幾何学的構造がわかっているのに、なぜ分類器を学習するために努力を費やすのでしょうか?この論文では、ETF としてランダムに初期化され、トレーニング中に固定された分類子を使用して、分類用のニューラルネットワークを学習する可能性について検討します。レイヤーピールモデルに基づく私たちの分析作業は、データセットがクラス間で不均衡であっても、固定 ETF 分類子を使用した特徴学習が自然に神経崩壊状態につながることを示しています。さらに、この場合、クロスエントロピー (CE) 損失は必要なく、同じグローバル最適性を共有するがより良い収束特性を享受する単純な 2 乗損失に置き換えることができることを示します。私たちの実験結果は、私たちの方法が複数の不均衡なデータセットの収束を高速化して大幅な改善をもたらすことができることを示しています。

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a better convergence property. Our experimental results show that our method is able to bring significant improvements with faster convergence on multiple imbalanced datasets.

updated: Wed Oct 12 2022 06:46:53 GMT+0000 (UTC)

published: Thu Mar 17 2022 04:34:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト