Towards Realistic Out-of-Distribution Detection: A Novel Evaluation Framework for Improving Generalization in OOD Detection

Vahid Reza Khazaie; Anthony Wong; Mohammad Sabokrou

現実的な分布外検出に向けて: OOD 検出の一般化を改善するための新しい評価フレームワーク

この論文では、より現実的な設定で機械学習モデルのパフォーマンスを評価することを目的とした、配信外 (OOD) 検出のための新しい評価フレームワークを紹介します。 OOD 検出方法をテストするための実際の要件が、現在のテストプロトコルでは満たされていないことがわかりました。これらは通常、通常のデータにおける低レベルの多様性に向けた強いバイアスを持つ方法を奨励します。この制限に対処するために、研究者が現実的な分布シフトの下で OOD 検出パフォーマンスをベンチマークできるようにする新しい OOD テストデータセット (CIFAR-10-R、CIFAR-100-R、および ImageNet-30-R) を提案します。さらに、OOD 検出中のモデルの汎化能力を測定するための汎化可能性スコア (GS) を導入します。私たちの実験は、既存のベンチマークデータセットのパフォーマンスを向上させても、現実世界のシナリオにおける OOD 検出モデルの使いやすさが必ずしも向上するとは限らないことを示しています。事前トレーニングされた詳細な機能の活用は、OOD 検出研究の有望な手段として特定されていますが、私たちの実験では、提案したデータセットでテストされた最先端の事前トレーニング済みモデルではパフォーマンスが大幅に低下することが示されています。この問題に対処するために、OOD スコアを計算する前に、これらの分布シフトの下で事前トレーニングされた特徴を適応させる後処理段階を提案します。これにより、ベンチマークでの最先端の事前トレーニングされたモデルのパフォーマンスが大幅に向上します。

This paper presents a novel evaluation framework for Out-of-Distribution (OOD) detection that aims to assess the performance of machine learning models in more realistic settings. We observed that the real-world requirements for testing OOD detection methods are not satisfied by the current testing protocols. They usually encourage methods to have a strong bias towards a low level of diversity in normal data. To address this limitation, we propose new OOD test datasets (CIFAR-10-R, CIFAR-100-R, and ImageNet-30-R) that can allow researchers to benchmark OOD detection performance under realistic distribution shifts. Additionally, we introduce a Generalizability Score (GS) to measure the generalization ability of a model during OOD detection. Our experiments demonstrate that improving the performance on existing benchmark datasets does not necessarily improve the usability of OOD detection models in real-world scenarios. While leveraging deep pre-trained features has been identified as a promising avenue for OOD detection research, our experiments show that state-of-the-art pre-trained models tested on our proposed datasets suffer a significant drop in performance. To address this issue, we propose a post-processing stage for adapting pre-trained features under these distribution shifts before calculating the OOD scores, which significantly enhances the performance of state-of-the-art pre-trained models on our benchmarks.

updated: Thu Aug 31 2023 12:09:45 GMT+0000 (UTC)

published: Sun Nov 20 2022 07:30:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト