A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Zhiheng Li; Ivan Evtimov; Albert Gordo; Caner Hazirbas; Tal Hassner; Cristian Canton Ferrer; Chenliang Xu; Mark Ibrahim

もぐらたたきのジレンマ: 近道は複数あり、一方を緩和すると他方が増幅される

機械学習モデルは、ショートカット (一般化できない意図しない決定ルール) を学習して、モデルの信頼性を損なうことがわかっています。以前の研究では、トレーニングデータにショートカットが 1 つしか存在しないという希薄な仮定の下で、この問題に対処していました。現実世界の画像には、背景からテクスチャまで、複数の視覚的手がかりがあふれています。ビジョンシステムの信頼性を向上させる鍵は、既存の方法が複数のショートカットを克服できるかどうか、またはもぐらたたきゲームで苦労できるかどうかを理解することです。この欠点に対処するために、2 つのベンチマークを提案します。1) 正確に制御されたスプリアスキューを含むデータセットである UrbanCars と、2) 透かし用の ImageNet に基づく評価セットである ImageNet-W。テクスチャと背景に加えて、ImageNet-W を使用すると、自然画像のトレーニングから生じる複数の近道を調べることができます。大規模な基盤モデルを含むコンピュータービジョンモデルは、トレーニングセット、アーキテクチャ、監督に関係なく、複数のショートカットが存在する場合に苦労します。ショートカットに対抗するために明示的に設計された方法でさえ、もぐらたたきのジレンマに苦しんでいます。この課題に取り組むために、私たちは Last Layer Ensemble を提案します。これは、もぐらたたき動作を行わずに複数のショートカットを軽減するためのシンプルでありながら効果的な方法です。私たちの結果は、ビジョンシステムの信頼性を向上させるために重要な見過ごされた課題として、マルチショートカットの軽減策を明らかにしました。データセットとコードがリリースされています: https://github.com/facebookresearch/Whac-A-Mole.

Machine learning models have been found to learn shortcuts -- unintended decision rules that are unable to generalize -- undermining models' reliability. Previous works address this problem under the tenuous assumption that only a single shortcut exists in the training data. Real-world images are rife with multiple visual cues from background to texture. Key to advancing the reliability of vision systems is understanding whether existing methods can overcome multiple shortcuts or struggle in a Whac-A-Mole game, i.e., where mitigating one shortcut amplifies reliance on others. To address this shortcoming, we propose two benchmarks: 1) UrbanCars, a dataset with precisely controlled spurious cues, and 2) ImageNet-W, an evaluation set based on ImageNet for watermark, a shortcut we discovered affects nearly every modern vision model. Along with texture and background, ImageNet-W allows us to study multiple shortcuts emerging from training on natural images. We find computer vision models, including large foundation models -- regardless of training set, architecture, and supervision -- struggle when multiple shortcuts are present. Even methods explicitly designed to combat shortcuts struggle in a Whac-A-Mole dilemma. To tackle this challenge, we propose Last Layer Ensemble, a simple-yet-effective method to mitigate multiple shortcuts without Whac-A-Mole behavior. Our results surface multi-shortcut mitigation as an overlooked challenge critical to advancing the reliability of vision systems. The datasets and code are released: https://github.com/facebookresearch/Whac-A-Mole.

updated: Tue Mar 21 2023 17:13:58 GMT+0000 (UTC)

published: Fri Dec 09 2022 18:59:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト