ImageNet-X: Understanding Model Mistakes with Factor of Variation Annotations

Badr Youbi Idrissi; Diane Bouchacourt; Randall Balestriero; Ivan Evtimov; Caner Hazirbas; Nicolas Ballas; Pascal Vincent; Michal Drozdzal; David Lopez-Paz; Mark Ibrahim

ImageNet-X: 変動係数の注釈を使用してモデルの誤りを理解する

ディープラーニングビジョンシステムは、信頼性が重要なアプリケーション全体に広く導入されています。ただし、現在の最高のモデルでも、ポーズ、照明、または背景が変化すると、オブジェクトを認識できないことがあります。既存のベンチマークは、モデルにとって困難な例を示していますが、そのような間違いが発生する理由を説明していません。このニーズに対応するために、姿勢、背景、または ImageNet-1k 検証セット全体の照明などの要因の 16 のヒューマンアノテーションのセットと、12k のトレーニング画像のランダムなサブセットである ImageNet-X を導入します。 ImageNet-X を搭載し、現在の 2,200 の認識モデルを調査し、モデルの (1) アーキテクチャ (例: 変換器と畳み込み)、(2) 学習パラダイム (例: 教師ありと自己教師あり)、および ( 3) 訓練手順、例えばデータ増強。これらの選択に関係なく、モデルには ImageNet-X カテゴリ全体で一貫した故障モードがあることがわかります。また、データ拡張によって特定の要因に対するロバスト性が向上する一方で、他の要因への波及効果が生じることもわかりました。たとえば、強力なランダムクロッピングは、小さなオブジェクトの堅牢性を損ないます。これらの洞察を総合すると、最新のビジョンモデルの堅牢性を向上させることが示唆されています。今後の研究では、追加のデータを収集し、データ拡張スキームを理解することに焦点を当てる必要があります。これらの洞察に加えて、ImageNet-X に基づいたツールキットをリリースし、画像認識システムが犯す間違いのさらなる研究に拍車をかけます。

Deep learning vision systems are widely deployed across applications where reliability is critical. However, even today's best models can fail to recognize an object when its pose, lighting, or background varies. While existing benchmarks surface examples challenging for models, they do not explain why such mistakes arise. To address this need, we introduce ImageNet-X, a set of sixteen human annotations of factors such as pose, background, or lighting the entire ImageNet-1k validation set as well as a random subset of 12k training images. Equipped with ImageNet-X, we investigate 2,200 current recognition models and study the types of mistakes as a function of model's (1) architecture, e.g. transformer vs. convolutional, (2) learning paradigm, e.g. supervised vs. self-supervised, and (3) training procedures, e.g., data augmentation. Regardless of these choices, we find models have consistent failure modes across ImageNet-X categories. We also find that while data augmentation can improve robustness to certain factors, they induce spill-over effects to other factors. For example, strong random cropping hurts robustness on smaller objects. Together, these insights suggest to advance the robustness of modern vision models, future research should focus on collecting additional data and understanding data augmentation schemes. Along with these insights, we release a toolkit based on ImageNet-X to spur further study into the mistakes image recognition systems make.

updated: Thu Nov 03 2022 14:56:32 GMT+0000 (UTC)

published: Thu Nov 03 2022 14:56:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト