Making Reconstruction-based Method Great Again for Video Anomaly Detection

Yizhou Wang; Can Qin; Yue Bai; Yi Xu; Xu Ma; Yun Fu

ビデオ異常検出のための再構成ベースの方法を再び最適化する

ビデオでの異常検出は、重大かつ困難な問題です。ディープニューラルネットワークに基づく以前のアプローチでは、再構成ベースまたは予測ベースのアプローチが採用されています。それにもかかわらず、既存の再構成ベースの方法は、1) 旧式の畳み込みオートエンコーダーに依存しており、時間依存性のモデリングが苦手です。 2) トレーニングサンプルをオーバーフィットする傾向があり、推論段階で正常なフレームと異常なフレームの見分けがつかない再構成エラーが発生します。このような問題に対処するために、まず、トランスフォーマーからインスピレーションを得て、連続フレーム再構築を強化するための新しいオートエンコーダーモデルとして、STATE と呼ばれる時空間オートトランスエンコーダーを提案します。当社の STATE には、効率的な時間学習と推論のために特別に設計された学習可能な畳み込み注意モジュールが装備されています。第二に、異常なフレームをさらに区別するために、テスト中に新しい再構成ベースの入力摂動手法を提案しました。同じ摂動の大きさで、正常なフレームのテスト再構成誤差は、異常なフレームのそれよりも低くなり、再構成のオーバーフィッティングの問題を軽減するのに貢献します。フレーム異常とフレーム内のオブジェクトの関連性が高いため、生のフレームと対応するオプティカルフローパッチの両方を使用してオブジェクトレベルの再構成を行います。最後に、異常スコアは、摂動入力を使用した生のエラーとモーション再構築エラーの組み合わせに基づいて設計されます。ベンチマークビデオ異常検出データセットに関する広範な実験は、私たちのアプローチが以前の再構成ベースの方法よりも大幅に優れており、最先端の異常検出パフォーマンスを一貫して達成することを示しています。コードは https://github.com/wyzjack/MRMGA4VAD で入手できます。

Anomaly detection in videos is a significant yet challenging problem. Previous approaches based on deep neural networks employ either reconstruction-based or prediction-based approaches. Nevertheless, existing reconstruction-based methods 1) rely on old-fashioned convolutional autoencoders and are poor at modeling temporal dependency; 2) are prone to overfit the training samples, leading to indistinguishable reconstruction errors of normal and abnormal frames during the inference phase. To address such issues, firstly, we get inspiration from transformer and propose Spatio- Temporal Auto- Trans- Encoder, dubbed as STATE, as a new autoencoder model for enhanced consecutive frame reconstruction. Our STATE is equipped with a specifically designed learnable convolutional attention module for efficient temporal learning and reasoning. Secondly, we put forward a novel reconstruction-based input perturbation technique during testing to further differentiate anomalous frames. With the same perturbation magnitude, the testing reconstruction error of the normal frames lowers more than that of the abnormal frames, which contributes to mitigating the overfitting problem of reconstruction. Owing to the high relevance of the frame abnormality and the objects in the frame, we conduct object-level reconstruction using both the raw frame and the corresponding optical flow patches. Finally, the anomaly score is designed based on the combination of the raw and motion reconstruction errors using perturbed inputs. Extensive experiments on benchmark video anomaly detection datasets demonstrate that our approach outperforms previous reconstruction-based methods by a notable margin, and achieves state-of-the-art anomaly detection performance consistently. The code is available at https://github.com/wyzjack/MRMGA4VAD.

updated: Sat Jan 28 2023 01:57:57 GMT+0000 (UTC)

published: Sat Jan 28 2023 01:57:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト