Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Devansh Arpit; Huan Wang; Yingbo Zhou; Caiming Xiong

平均のアンサンブル：ドメインの一般化におけるモデル選択の改善とパフォーマンスの向上

ドメイン一般化（DG）設定では、トレーニングドメインの特定のセットで独立してトレーニングされたモデルは、分布シフトされたテストドメインで悪名高い混沌としたパフォーマンスを示し、最適化（シードなど）の確率が大きな役割を果たします。これにより、深層学習モデルは実際の設定では信頼できなくなります。最初に、この混沌とした動作が単一モデルのトレーニング最適化軌道に沿って存在することを示し、ドメイン内の検証精度間の順位相関を改善することにより、ドメインの一般化を大幅に向上させ、確率の影響を軽減する単純なモデル平均化プロトコルを提案します。信頼性の高い早期打ち切りに不可欠な、ドメイン外のテスト精度。私たちの観察を利用して、平均化されていないモデル（実際には一般的）をアンサンブルする代わりに、独立した実行から移動平均モデル（EoA）をアンサンブルするとパフォーマンスがさらに向上することを示します。よく知られている偏りと分散のトレードオフをドメインの一般化設定に適合させることにより、アンサンブルとモデル平均化のパフォーマンスの向上を理論的に説明します。 DomainBedベンチマークでは、事前トレーニング済みのResNet-50を使用すると、この平均のアンサンブルは平均68.0％を達成し、バニラERM（平均化/アンサンブルなし）を約4％上回る、事前トレーニング済みのRegNetYを使用する場合-16GFは、平均76.6％を達成し、バニラERMを6％上回っています。私たちのコードはhttps://github.com/salesforce/ensemble-of-averagesで入手できます。

In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of 68.0%, beating vanilla ERM (w/o averaging/ensembling) by ∼4%, and when using a pre-trained RegNetY-16GF, achieves an average of 76.6%, beating vanilla ERM by 6%. Our code is available at https://github.com/salesforce/ensemble-of-averages.

updated: Tue May 24 2022 16:02:46 GMT+0000 (UTC)

published: Thu Oct 21 2021 00:08:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト