Unsupervised Industrial Anomaly Detection via Pattern Generative and Contrastive Networks

Jianfeng Huang; Chenyang Li; Yimin Lin; Shiguo Lian

パターン生成および対照ネットワークを介した教師なし産業異常検出

工業生産で深層学習ネットワークをトレーニングするのに十分な欠陥画像を収集することは困難です。したがって、既存の産業異常検出方法は、このタスクを達成するためにCNNベースの教師なし検出およびローカリゼーションネットワークを使用することを好みます。ただし、従来のエンドツーエンドネットワークは、非線形モデルを高次元空間に適合させるという障壁に悩まされているため、新しい信号で変化が発生した場合、これらの方法は常に失敗します。さらに、通常の画像の特徴を本質的にクラスタリングすることでメモリライブラリを備えているため、テクスチャの変更に対して堅牢ではありません。この目的のために、Vision Transformerベース（VITベース）の教師なし異常検出ネットワークを提案します。階層的なタスク学習と人間の経験を利用して、解釈可能性を高めます。私たちのネットワークは、パターン生成と比較ネットワークで構成されています。パターン生成ネットワークは、2つのVITベースのエンコーダモジュールを使用して2つの連続する画像パッチの特徴を抽出し、次にVITベースのデコーダモジュールを使用してこれらの特徴の人間が設計したスタイルを学習し、3番目の画像パッチを予測します。この後、シャムベースのネットワークを使用して、生成画像パッチと元の画像パッチの類似性を計算します。最後に、双方向推論戦略によって異常のローカリゼーションを改良します。公開データセットMVTecデータセットでの比較実験は、私たちの方法が99.8％のAUCを達成することを示しています。これは、以前の最先端の方法を上回っています。さらに、私たちは私たち自身の革と布のデータセットに定性的なイラストを与えます。正確なセグメント結果は、異常検出における私たちの方法の正確さを強く証明しています。

It is hard to collect enough flaw images for training deep learning network in industrial production. Therefore, existing industrial anomaly detection methods prefer to use CNN-based unsupervised detection and localization network to achieve this task. However, these methods always fail when there are varieties happened in new signals since traditional end-to-end networks suffer barriers of fitting nonlinear model in high-dimensional space. Moreover, they have a memory library by clustering the feature of normal images essentially, which cause it is not robust to texture change. To this end, we propose the Vision Transformer based (VIT-based) unsupervised anomaly detection network. It utilizes a hierarchical task learning and human experience to enhance its interpretability. Our network consists of pattern generation and comparison networks. Pattern generation network uses two VIT-based encoder modules to extract the feature of two consecutive image patches, then uses VIT-based decoder module to learn the human designed style of these features and predict the third image patch. After this, we use the Siamese-based network to compute the similarity of the generation image patch and original image patch. Finally, we refine the anomaly localization by the bi-directional inference strategy. Comparison experiments on public dataset MVTec dataset show our method achieves 99.8% AUC, which surpasses previous state-of-the-art methods. In addition, we give a qualitative illustration on our own leather and cloth datasets. The accurate segment results strongly prove the accuracy of our method in anomaly detection.

updated: Thu Aug 15 2024 03:25:15 GMT+0000 (UTC)

published: Wed Jul 20 2022 10:09:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト