Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

Devansh Arpit; Huan Wang; Yingbo Zhou; Caiming Xiong

平均のアンサンブル：ドメインの一般化におけるモデル選択の改善とパフォーマンスの向上

ドメイン一般化（DG）設定では、トレーニングドメインの特定のセットでトレーニングされたモデルは、分布シフトされたテストドメインで悪名高い混沌としたパフォーマンスを示し、最適化（シードなど）の確率が大きな役割を果たします。これにより、深層学習モデルは実際の設定では信頼できなくなります。最初に、最適化パスに沿ってモデルパラメータを平均化するための単純なプロトコルが、トレーニングの初期に開始され、ドメインの一般化を大幅に促進し、ドメイン内の検証精度とドメイン外のテスト精度の間の順位相関を改善することにより、確率論の影響を軽減することを示します。、これは信頼性の高いモデル選択に不可欠です。次に、独立してトレーニングされたモデルのアンサンブルもDG設定でカオス的動作をすることを示します。私たちの観察を利用して、平均化されていないモデルをアンサンブルする代わりに、異なる実行からの移動平均モデル（EoA）をアンサンブルすると、安定性が向上し、パフォーマンスがさらに向上することを示します。 DomainBedベンチマークでは、ImageNetで事前トレーニングされたResNet-50を使用すると、この平均のアンサンブルは、PACSで88.6％、VLCSで79.1％、OfficeHomeで72.5％、TerraIncognitaで52.3％、DomainNetで47.4％を達成します。 68.0％で、ERM（モデル平均なし）を約4％上回っています。また、より大きなデータセットで事前トレーニングされたモデルを評価します。ここでは、EoAが72.7％の平均精度を達成し、対応するERMベースラインを5％上回っていることを示しています。

In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves 88.6% on PACS, 79.1% on VLCS, 72.5% on OfficeHome, 52.3% on TerraIncognita, and 47.4% on DomainNet, an average of 68.0%, beating ERM (w/o model averaging) by ∼4%. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of 72.7%, beating its corresponding ERM baseline by 5%.

updated: Thu Oct 21 2021 00:08:17 GMT+0000 (UTC)

published: Thu Oct 21 2021 00:08:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト