From Global to Local: Multi-scale Out-of-distribution Detection

Ji Zhang; Lianli Gao; Bingguang Hao; Hao Huang; Jingkuan Song; Hengtao Shen

グローバルからローカルへ: マルチスケールの分布外検出

配布外 (OOD) 検出は、配布内 (ID) トレーニングプロセス中にラベルが表示されなかった「未知の」データを検出することを目的としています。表現学習の最近の進歩により、ID クラスのトレーニングデータへの相対距離に従って入力を ID/OOD として認識する距離ベースの OOD 検出が生まれました。以前のアプローチは、グローバルイメージ表現のみに依存してペアワイズ距離を計算しますが、避けられない背景のクラッターやクラス内変動により、特定の表現空間内で同じ ID クラスからのイメージレベルの表現が遠く離れてしまう可能性があるため、最適とは言えません。この研究では、マルチスケール OOD 検出 (MODE) を提案することで、この課題を克服しました。これは、OOD 検出に最大限のメリットをもたらすために、画像のグローバルな視覚情報とローカル領域の詳細の両方を活用する最初のフレームワークです。具体的には、既製のクロスエントロピーまたはコントラスト損失によって事前トレーニングされた既存のモデルは、ID トレーニングと OOD 検出プロセスの間のスケールの不一致により、MODE の貴重なローカル表現を捕捉することができないことが最初にわかりました。この問題を軽減し、ID トレーニングで局所的に差別的な表現を奨励するために、我々は、クロスアテンションメカニズムを利用して、ペアごとの例のターゲットオブジェクトの局所領域を位置合わせして強調表示するトレーニング可能な目標である、アテンションベースのローカルプロパゲーション (ALPA) を提案します。テスト時の OOD 検出中に、ID/OOD データをより忠実に区別するために、最も識別力の高いマルチスケール表現に基づいてクロススケール決定 (CSD) 関数がさらに考案されます。いくつかのベンチマークで MODE の有効性と柔軟性を実証します。平均して、MODE は以前の最先端のものよりも FPR で最大 19.24%、AUROC で 2.77% 優れています。コードは https://github.com/JimZAI/MODE-OOD で入手できます。

Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection that recognizes inputs as ID/OOD according to their relative distances to the training data of ID classes. Previous approaches calculate pairwise distances relying only on global image representations, which can be sub-optimal as the inevitable background clutter and intra-class variation may drive image-level representations from the same ID class far apart in a given representation space. In this work, we overcome this challenge by proposing Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details of images to maximally benefit OOD detection. Specifically, we first find that existing models pretrained by off-the-shelf cross-entropy or contrastive losses are incompetent to capture valuable local representations for MODE, due to the scale-discrepancy between the ID training and OOD detection processes. To mitigate this issue and encourage locally discriminative representations in ID training, we propose Attention-based Local PropAgation (ALPA), a trainable objective that exploits a cross-attention mechanism to align and highlight the local regions of the target objects for pairwise examples. During test-time OOD detection, a Cross-Scale Decision (CSD) function is further devised on the most discriminative multi-scale representations to distinguish ID/OOD data more faithfully. We demonstrate the effectiveness and flexibility of MODE on several benchmarks -- on average, MODE outperforms the previous state-of-the-art by up to 19.24% in FPR, 2.77% in AUROC. Code is available at https://github.com/JimZAI/MODE-OOD.

updated: Sun Aug 20 2023 11:56:25 GMT+0000 (UTC)

published: Sun Aug 20 2023 11:56:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト