ESD: Expected Squared Difference as a Tuning-Free Trainable Calibration Measure

Hee Suk Yoon; Joshua Tian Jin Tee; Eunseop Yoon; Sunjae Yoon; Gwangsu Kim; Yingzhen Li; Chang D. Yoo

ESD: チューニング不要のトレーニング可能なキャリブレーション測定として期待される二乗差

研究によると、最新のニューラルネットワークは、自信過剰な予測が原因で調整が不十分になる傾向があることが示されています。従来、後処理方法は、トレーニング後にモデルを調整するために使用されてきました。近年、トレーニングプロセスに直接組み込むために、さまざまなトレーニング可能なキャリブレーション手段が提案されています。ただし、これらの方法にはすべて内部ハイパーパラメーターが組み込まれており、これらのキャリブレーション目標のパフォーマンスはこれらのハイパーパラメーターの調整に依存しており、ニューラルネットワークとデータセットのサイズが大きくなるにつれて、より多くの計算コストが発生します。そのため、チューニング不要 (つまり、ハイパーパラメーター不要) のトレーニング可能なキャリブレーション目標損失である期待二乗差 (ESD) を提示します。ここでは、2 つの期待値の二乗差の観点からキャリブレーションエラーを表示します。いくつかのアーキテクチャ (CNN、トランスフォーマー) とデータセットに関する大規模な実験により、(1) ESD をトレーニングに組み込むことで、内部のハイパーパラメーター調整を必要とせずに、さまざまなバッチサイズ設定でモデルのキャリブレーションが改善されること、(2) ESD によって最適なキャリブレーションが得られることが実証されました。 (3) 内部ハイパーパラメーターがないため、ESD はトレーニング中のキャリブレーションに必要な計算コストを大幅に改善します。コードは、https://github.com/hee-suk-yoon/ESD で公開されています。

Studies have shown that modern neural networks tend to be poorly calibrated due to over-confident predictions. Traditionally, post-processing methods have been used to calibrate the model after training. In recent years, various trainable calibration measures have been proposed to incorporate them directly into the training process. However, these methods all incorporate internal hyperparameters, and the performance of these calibration objectives relies on tuning these hyperparameters, incurring more computational costs as the size of neural networks and datasets become larger. As such, we present Expected Squared Difference (ESD), a tuning-free (i.e., hyperparameter-free) trainable calibration objective loss, where we view the calibration error from the perspective of the squared difference between the two expectations. With extensive experiments on several architectures (CNNs, Transformers) and datasets, we demonstrate that (1) incorporating ESD into the training improves model calibration in various batch size settings without the need for internal hyperparameter tuning, (2) ESD yields the best-calibrated results compared with previous approaches, and (3) ESD drastically improves the computational costs required for calibration during training due to the absence of internal hyperparameter. The code is publicly accessible at https://github.com/hee-suk-yoon/ESD.

updated: Sat Mar 04 2023 18:06:36 GMT+0000 (UTC)

published: Sat Mar 04 2023 18:06:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト