LGN-Net: Local-Global Normality Network for Video Anomaly Detection

Mengyang Zhao; Xinhua Zeng; Yang Liu; Jing Liu; Di Li; Xing Hu; Chengxin Pang

LGN-Net: ビデオ異常検出のためのローカル/グローバル正規性ネットワーク

ビデオ異常検出 (VAD) は、インテリジェントビデオシステムに応用できる可能性があるため、何年にもわたって集中的に研究されてきました。既存の教師なし VAD メソッドは、通常のビデオのみで構成されるトレーニングセットから正規性を学習し、そのような正規性から逸脱したインスタンスを異常と見なす傾向があります。ただし、彼らは多くの場合、時間次元でローカルまたはグローバルな正規性のみを考慮します。それらのいくつかは、通常のイベントの表現を強化するために、連続したフレームから局所的な時空間表現を学習することに焦点を当てています。しかし、強力な表現により、これらのメソッドはいくつかの異常を表現でき、検出ミスを引き起こします。対照的に、他の方法は、トレーニングビデオ全体の典型的な正常パターンを記憶することに専念しており、異常の一般化を弱めます。これはまた、それらが多様な正常パターンを表現することを制限し、誤警報を引き起こします.この目的のために、ローカルとグローバルの正規性を同時に学習するための 2 ブランチモデル、ローカルグローバル正規性ネットワーク (LGN-Net) を提案します。具体的には、時空間予測ネットワークを利用して、連続するフレームから見え方や動きの進化規則性を局所正規性として学習する枝と、動画全体のプロトタイプ特徴を大域正規性としてメモリモジュールに記憶する枝です。 LGN-Net は、ローカルな正常性とグローバルな正常性を融合することで、正常なインスタンスと異常なインスタンスを表現するバランスを実現します。さらに、融合された正規性により、LGN-Net は単一の正規性を利用するだけでなく、さまざまなシーンに一般化できます。実験は、私たちの方法の有効性と優れた性能を示しています。コードはオンラインで入手できます: https://github.com/Myzhao1999/LGN-Net。

Video anomaly detection (VAD) has been intensively studied for years because of its potential applications in intelligent video systems. Existing unsupervised VAD methods tend to learn normality from training sets consisting of only normal videos and regard instances deviating from such normality as anomalies. However, they often consider only local or global normality in the temporal dimension. Some of them focus on learning local spatiotemporal representations from consecutive frames to enhance the representation for normal events. But powerful representation allows these methods to represent some anomalies and causes miss detection. In contrast, the other methods are devoted to memorizing prototypical normal patterns of whole training videos to weaken the generalization for anomalies, which also restricts them from representing diverse normal patterns and causes false alarm. To this end, we propose a two-branch model, Local-Global Normality Network (LGN-Net), to simultaneously learn local and global normality. Specifically, one branch learns the evolution regularities of appearance and motion from consecutive frames as local normality utilizing a spatiotemporal prediction network, while the other branch memorizes prototype features of the whole videos as global normality by a memory module. LGN-Net achieves a balance of representing normal and abnormal instances by fusing local and global normality. In addition, the fused normality enables LGN-Net to generalize to various scenes more than exploiting single normality. Experiments demonstrate the effectiveness and superior performance of our method. The code is available online: https://github.com/Myzhao1999/LGN-Net.

updated: Sun Jan 08 2023 09:35:47 GMT+0000 (UTC)

published: Mon Nov 14 2022 15:32:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト