Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks

Erdong Guo; David Draper; Maria De Iorio

アニーリングダブルヘッド: ディープニューラルネットワークのオンラインキャリブレーションのためのアーキテクチャ

モデルが正しく予測する頻度に関係するモデルのキャリブレーションは、統計モデルの設計において重要な役割を果たすだけでなく、現実世界での最適な意思決定など、実質的な実用的なアプリケーションもあります。ただし、最新のディープニューラルネットワークは、オーバーフィッティングと密接に関連する予測信頼度の過大評価 (または過小評価) のために、一般的にキャリブレーションが不十分であることが発見されています。このホワイトペーパーでは、トレーニング中に DNN を調整するための、実装が簡単でありながら非常に効果的なアーキテクチャであるアニーリングダブルヘッドを提案します。正確には、正規モデルの最後の潜在層の上に追加のキャリブレーションヘッド (通常は 1 つの潜在層を持つ浅いニューラルネットワーク) を構築して、ロジットを整列された信頼度にマッピングします。さらに、トレーニング手順でキャリブレーションヘッドによってロジットを動的にスケーリングする単純なアニーリング手法が開発され、そのパフォーマンスが向上します。分布内および分布シフトの両方の状況下で、現代の DNN アーキテクチャと視覚および音声データセットの複数のペアで、アニーリングダブルヘッドアーキテクチャを徹底的に評価します。私たちの方法が、後処理なしで最先端のモデルキャリブレーションパフォーマンスを達成すると同時に、さまざまな学習タスクで最近提案された他のキャリブレーション方法と比較して同等の予測精度を提供することを示します。

Model calibration, which is concerned with how frequently the model predicts correctly, not only plays a vital part in statistical model design, but also has substantial practical applications, such as optimal decision-making in the real world. However, it has been discovered that modern deep neural networks are generally poorly calibrated due to the overestimation (or underestimation) of predictive confidence, which is closely related to overfitting. In this paper, we propose Annealing Double-Head, a simple-to-implement but highly effective architecture for calibrating the DNN during training. To be precise, we construct an additional calibration head-a shallow neural network that typically has one latent layer-on top of the last latent layer in the normal model to map the logits to the aligned confidence. Furthermore, a simple Annealing technique that dynamically scales the logits by calibration head in training procedure is developed to improve its performance. Under both the in-distribution and distributional shift circumstances, we exhaustively evaluate our Annealing Double-Head architecture on multiple pairs of contemporary DNN architectures and vision and speech datasets. We demonstrate that our method achieves state-of-the-art model calibration performance without post-processing while simultaneously providing comparable predictive accuracy in comparison to other recently proposed calibration methods on a range of learning tasks.

updated: Mon Jan 16 2023 04:36:27 GMT+0000 (UTC)

published: Tue Dec 27 2022 21:21:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト