AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

Zhaopeng Gu; Bingke Zhu; Guibo Zhu; Yingying Chen; Ming Tang; Jinqiao Wang

AnomalyGPT: 大規模な視覚言語モデルを使用した産業異常の検出

MiniGPT-4 や LLaVA などの大規模視覚言語モデル (LVLM) は、画像を理解する能力を実証し、さまざまな視覚タスクで顕著なパフォーマンスを達成しました。広範なトレーニングデータセットにより、一般的なオブジェクトを認識する能力は優れていますが、特定の領域の知識が不足しており、オブジェクト内の局所的な詳細の理解が弱いため、産業異常検出 (IAD) タスクでの有効性が妨げられます。一方、既存の IAD 手法のほとんどは異常スコアのみを提供し、正常サンプルと異常サンプルを区別するために手動でしきい値を設定する必要があるため、実際の実装が制限されています。このペーパーでは、IAD 問題に対処するための LVLM の利用を検討し、LVLM に基づく新しい IAD アプローチである AnomalyGPT を提案します。異常な画像をシミュレートし、各画像に対応するテキストの説明を生成することにより、トレーニングデータを生成します。また、画像デコーダを使用して、きめ細かいセマンティクスを提供し、プロンプト埋め込みを使用して LVLM を微調整するためのプロンプト学習者を設計します。当社の AnomalyGPT は、手動によるしきい値調整の必要性を排除し、異常の存在と位置を直接評価します。さらに、AnomalyGPT はマルチターンダイアログをサポートし、数回のショットで優れたコンテキスト内学習機能を発揮します。 AnomalyGPT は、MVTec-AD データセット上で 1 回の通常ショットのみで、精度 86.1%、画像レベル AUC 94.1%、ピクセルレベル AUC 95.3% という最先端のパフォーマンスを達成します。コードは https://github.com/CASIA-IVA-Lab/AnomalyGPT で入手できます。

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.

updated: Mon Sep 04 2023 11:44:48 GMT+0000 (UTC)

published: Tue Aug 29 2023 15:02:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト