Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

Haotian Ju; Dongyue Li; Hongyang R. Zhang

ヘッセ行列ベースの一般化保証によるディープニューラルネットワークの堅牢な微調整

ターゲットタスクで事前トレーニング済みのディープニューラルネットワークを微調整する転移学習アプローチを検討します。微調整の一般化特性を研究して、実際に一般的に発生するオーバーフィッティングの問題を理解します。以前の研究では、微調整の初期化からの距離を制限すると、一般化が改善されることが示されています。 PAC-ベイジアン分析を使用して、初期化からの距離に加えて、ヘッセ行列がノイズ注入に対するディープニューラルネットワークのノイズ安定性を通じて一般化に影響することを観察します。観察に動機付けられて、幅広い微調整方法のヘッセ距離ベースの一般化境界を開発します。さらに、ノイズの多いラベルが存在する場合の微調整の堅牢性を研究します。私たちの理論に基づいて、一貫した損失と微調整のための距離ベースの正則化を組み込んだアルゴリズムを設計し、トレーニングセットラベルのクラス条件付き独立ノイズの下で一般化エラーを保証します。さまざまなノイズの多い環境とアーキテクチャで、アルゴリズムの詳細な実証的研究を行います。プログラムによるラベル付けを使用してトレーニングラベルが生成される 6 つの画像分類タスクでは、以前の微調整方法よりも 3.26% の精度向上が見られます。一方、微調整されたモデルのヘッセ距離測定値は、既存のアプローチよりも 6 倍減少します。

We consider transfer learning approaches that fine-tune a pretrained deep neural network on a target task. We study generalization properties of fine-tuning to understand the problem of overfitting, which commonly occurs in practice. Previous works have shown that constraining the distance from the initialization of fine-tuning improves generalization. Using a PAC-Bayesian analysis, we observe that besides distance from initialization, Hessians affect generalization through the noise stability of deep neural networks against noise injections. Motivated by the observation, we develop Hessian distance-based generalization bounds for a wide range of fine-tuning methods. Additionally, we study the robustness of fine-tuning in the presence of noisy labels. Motivated by our theory, we design an algorithm that incorporates consistent losses and distance-based regularization for fine-tuning, along with a generalization error guarantee under class conditional independent noise in the training set labels. We perform a detailed empirical study of our algorithm on various noisy environments and architectures. On six image classification tasks whose training labels are generated with programmatic labeling, we find a 3.26% accuracy gain over prior fine-tuning methods. Meanwhile, the Hessian distance measure of the fine-tuned model decreases by six times more than existing approaches.

updated: Mon Aug 29 2022 00:20:04 GMT+0000 (UTC)

published: Mon Jun 06 2022 14:52:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト