Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

Yibo Yang; Liang Xie; Shixiang Chen; Xiangtai Li; Zhouchen Lin; Dacheng Tao

ディープニューラルネットワークの最後に学習可能な分類器が本当に必要ですか？

分類のための最新のディープニューラルネットワークは、通常、表現のバックボーンと各クラスのロジットを出力する線形分類器を共同で学習します。最近の研究では、バランスの取れたデータセットでのトレーニングの最終段階で、特徴のクラス内平均と分類器ベクトルがシンプレックス等角タイトフレーム（ETF）の頂点に収束するという神経崩壊と呼ばれる現象が示されています。 ETFの幾何学的構造は、分類器内のすべてのクラスのペアワイズ角度を最大限に分離するため、疑問を投げかけるのは当然です。最適な幾何学的構造がわかっているのに、なぜ分類器を学ぶために努力するのでしょうか。この論文では、ETFとしてランダムに初期化され、トレーニング中に固定された分類器を使用して、分類のためのニューラルネットワークを学習する可能性を研究します。レイヤーピールモデルに基づく分析作業は、データセットがクラス間で不均衡な場合でも、固定ETF分類器を使用した特徴学習が自然に神経崩壊状態につながることを示しています。さらに、この場合、クロスエントロピー（CE）損失は不要であり、同じグローバル最適性を共有するが、より正確な勾配とより優れた収束特性を享受する単純な2乗損失に置き換えることができることを示します。私たちの実験結果は、私たちの方法がバランスの取れたデータセットの画像分類で同様のパフォーマンスを達成し、ロングテールおよびファイングレインの分類タスクに大幅な改善をもたらすことができることを示しています。

Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes in the classifier, it is natural to raise the question, why do we spend an effort to learn a classifier when we know its optimal geometric structure? In this paper, we study the potential of learning a neural network for classification with the classifier randomly initialized as an ETF and fixed during training. Our analytical work based on the layer-peeled model indicates that the feature learning with a fixed ETF classifier naturally leads to the neural collapse state even when the dataset is imbalanced among classes. We further show that in this case the cross entropy (CE) loss is not necessary and can be replaced by a simple squared loss that shares the same global optimality but enjoys a more accurate gradient and better convergence property. Our experimental results show that our method is able to achieve similar performances on image classification for balanced datasets, and bring significant improvements in the long-tailed and fine-grained classification tasks.

updated: Thu Mar 17 2022 04:34:28 GMT+0000 (UTC)

published: Thu Mar 17 2022 04:34:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト