The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration

Bingyuan Liu; Ismail Ben Ayed; Adrian Galdran; Jose Dolz

悪魔はマージンにあります：ネットワークキャリブレーションのためのマージンベースのラベルスムージング

ディープニューラルネットワークの卓越したパフォーマンスにもかかわらず、最近の研究では、それらのキャリブレーションが不十分であり、予測に自信がないことが示されています。ミスキャリブレーションは、トレーニング中のクロスエントロピーが最小化されるため、オーバーフィットによって悪化する可能性があります。これは、予測されるソフトマックス確率がワンホットラベルの割り当てと一致するように促進するためです。これにより、残りのアクティベーションよりも大幅に大きい正しいクラスのpre-softmaxアクティベーションが生成されます。文献からの最近の証拠は、予測のエントロピーの暗黙的または明示的な最大化を組み込んだ損失関数が最先端のキャリブレーションパフォーマンスをもたらすことを示唆しています。現在の最先端のキャリブレーション損失の統一された制約付き最適化の視点を提供します。具体的には、これらの損失は、ロジット距離に等式制約を課す線形ペナルティ（またはラグランジュ）の近似と見なすことができます。これは、そのような基礎となる等式制約の重要な制限を示しており、その後続の勾配は常に非情報ソリューションに向かって進み、勾配ベースの最適化中にモデルの識別パフォーマンスとキャリブレーションの間の最良の妥協点に到達するのを妨げる可能性があります。私たちの観察に続いて、ロジット距離に制御可能なマージンを課す不等式制約に基づく単純で柔軟な一般化を提案します。さまざまな画像分類、セマンティックセグメンテーション、およびNLPベンチマークに関する包括的な実験は、私たちの方法が、識別性能に影響を与えることなく、ネットワークキャリブレーションの観点からこれらのタスクに新しい最先端の結果を設定することを示しています。コードはhttps://github.com/by-liu/MbLSで入手できます。

In spite of the dominant performances of deep neural networks, recent works have shown that they are poorly calibrated, resulting in over-confident predictions. Miscalibration can be exacerbated by overfitting due to the minimization of the cross-entropy during training, as it promotes the predicted softmax probabilities to match the one-hot label assignments. This yields a pre-softmax activation of the correct class that is significantly larger than the remaining activations. Recent evidence from the literature suggests that loss functions that embed implicit or explicit maximization of the entropy of predictions yield state-of-the-art calibration performances. We provide a unifying constrained-optimization perspective of current state-of-the-art calibration losses. Specifically, these losses could be viewed as approximations of a linear penalty (or a Lagrangian) imposing equality constraints on logit distances. This points to an important limitation of such underlying equality constraints, whose ensuing gradients constantly push towards a non-informative solution, which might prevent from reaching the best compromise between the discriminative performance and calibration of the model during gradient-based optimization. Following our observations, we propose a simple and flexible generalization based on inequality constraints, which imposes a controllable margin on logit distances. Comprehensive experiments on a variety of image classification, semantic segmentation and NLP benchmarks demonstrate that our method sets novel state-of-the-art results on these tasks in terms of network calibration, without affecting the discriminative performance. The code is available at https://github.com/by-liu/MbLS .

updated: Tue Nov 30 2021 14:21:47 GMT+0000 (UTC)

published: Tue Nov 30 2021 14:21:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト