Get the Best of Both Worlds: Improving Accuracy and Transferability by Grassmann Class Representation

Haoqi Wang; Zhizhong Li; Wayne Zhang

両方の長所を活かす: グラスマンクラス表現による精度と伝達性の向上

我々は、ニューラルネットワークに見られるクラスベクトルを線形部分空間（つまり、グラスマン多様体の点）に一般化し、グラスマンクラス表現（GCR）によって精度と特徴伝達性が同時に向上することを示します。 GCR では、各クラスは部分空間であり、ロジットはクラス部分空間への特徴の射影のノルムとして定義されます。グラスマン関数のクラス部分空間が残りのモデルパラメーターと共同で最適化されるように、リーマン関数 SGD を深層学習フレームワークに統合します。ベクトル形式と比較して、部分空間の表現機能はより強力です。 ImageNet-1K では、ResNet50-D、ResNeXt50、Swin-T、Deit3-S のトップ 1 エラーがそれぞれ 5.6%、4.5%、3.0%、3.5% 減少することがわかります。部分空間は特徴を自由に変化させることもでき、部分空間の次元が増加すると、クラス内の特徴のばらつきが大きくなることが観察されました。その結果、GCR 機能の品質は下流タスクの方が優れていることがわかりました。 ResNet50-D の場合、6 データセットにわたる平均線形転送精度は、バニラソフトマックスの強力なベースラインと比較して 77.98% から 79.70% に向上しました。 Swin-T の場合は 81.5% から 83.4% に向上し、Deit3 の場合は 73.8% から 81.4% に向上します。これらの心強い結果により、より多くのアプリケーションが Grassmann クラス表現から恩恵を受けることができると考えています。コードは https://github.com/innerlee/GCR でリリースされます。

We generalize the class vectors found in neural networks to linear subspaces (i.e.~points in the Grassmann manifold) and show that the Grassmann Class Representation (GCR) enables the simultaneous improvement in accuracy and feature transferability. In GCR, each class is a subspace and the logit is defined as the norm of the projection of a feature onto the class subspace. We integrate Riemannian SGD into deep learning frameworks such that class subspaces in a Grassmannian are jointly optimized with the rest model parameters. Compared to the vector form, the representative capability of subspaces is more powerful. We show that on ImageNet-1K, the top-1 error of ResNet50-D, ResNeXt50, Swin-T and Deit3-S are reduced by 5.6%, 4.5%, 3.0% and 3.5%, respectively. Subspaces also provide freedom for features to vary and we observed that the intra-class feature variability grows when the subspace dimension increases. Consequently, we found the quality of GCR features is better for downstream tasks. For ResNet50-D, the average linear transfer accuracy across 6 datasets improves from 77.98% to 79.70% compared to the strong baseline of vanilla softmax. For Swin-T, it improves from 81.5% to 83.4% and for Deit3, it improves from 73.8% to 81.4%. With these encouraging results, we believe that more applications could benefit from the Grassmann class representation. Code is released at https://github.com/innerlee/GCR.

updated: Thu Aug 03 2023 06:02:02 GMT+0000 (UTC)

published: Thu Aug 03 2023 06:02:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト