ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

Damien Teney; Yong Lin; Seong Joon Oh; Ehsan Abbasnejad

実世界のデータセットでは、ID と OOD のパフォーマンスが逆相関する場合があります

いくつかの研究では、コンピュータービジョンと NLP におけるモデルの配布内 (ID) と配布外 (OOD) のパフォーマンスを比較しました。彼らは頻繁に正の相関関係を報告していますが、驚くべきことに、必要なトレードオフを示す逆相関関係さえ観察しない人もいます。逆パターンの可能性は、ID パフォーマンスが OOD 汎用化機能の代理として機能できるかどうかを判断するために重要です。この論文では、複数のデータセットを使用して、ID と OOD のパフォーマンス間の逆相関が、理論上の最悪の場合の設定だけでなく、現実世界のデータでも実際に発生することを示しています。また、最小限の線形設定であってもこれらのケースがどのように発生するのか、およびモデルの偏った選択により過去の研究がそのようなケースを見逃す可能性がある理由も理論的に説明します。私たちの観察は、現在の文献の多くに見られるものと矛盾する推奨事項につながります。 - 高い OOD パフォーマンスでは、ID パフォーマンスとトレードオフする必要がある場合があります。 - ID パフォーマンスのみに焦点を当てても、最適な OOD パフォーマンスにつながらない可能性があります。 OOD パフォーマンスの収益が減少する (最終的にはマイナスになる) 可能性があります。 - このような場合、モデル選択に ID パフォーマンスを使用する OOD 一般化に関する研究 (一般的に推奨される手法) では、必然的に最もパフォーマンスの高いモデルを見逃すことになり、これらの研究ではあらゆる範囲の現象が見えなくなります。

Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena.

updated: Fri May 19 2023 07:24:53 GMT+0000 (UTC)

published: Thu Sep 01 2022 17:27:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト