Causal Balancing for Domain Generalization

Xinyi Wang; Michael Saxon; Jiachen Li; Hongyang Zhang; Kun Zhang; William Yang Wang

ドメイン一般化のための因果バランス

機械学習モデルは、さまざまな実世界のタスクで最先端技術を急速に進歩させていますが、これらのモデルが疑似相関に対して脆弱であることを考えると、ドメイン外 (OOD) 一般化は依然として困難な問題です。データ生成プロセスの根底にある因果メカニズムの不変性に基づいて、偏ったデータ分布をスプリアスのないバランスの取れた分布に変換するバランスの取れたミニバッチサンプリング戦略を提案します。このようなバランスの取れた分布でトレーニングされたベイズ最適分類器は、十分に多様な環境空間全体でミニマックス最適であると主張します。また、十分なトレーニング環境を利用する場合、提案されたデータ生成プロセスの潜在変数モデルの識別可能性保証も提供します。実験は DomainBed で実施され、ベンチマークで報告された 20 のベースライン全体で私たちの方法が最高のパフォーマンスを得ることを経験的に示しています。

While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. We argue that the Bayes optimal classifiers trained on such balanced distribution are minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed data generation process, when utilizing enough train environments. Experiments are conducted on DomainBed, demonstrating empirically that our method obtains the best performance across 20 baselines reported on the benchmark.

updated: Sun Feb 19 2023 23:47:57 GMT+0000 (UTC)

published: Fri Jun 10 2022 17:59:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト