A Video Anomaly Detection Framework based on Appearance-Motion Semantics Representation Consistency

Xiangyu Huang; Caidan Zhao; Yilin Wang; Zhiqiang Wu

外観-モーションセマンティクス表現の一貫性に基づくビデオ異常検出フレームワーク

ビデオ異常検出とは、予想される動作から逸脱したイベントの識別を指します。トレーニングに異常なサンプルがないため、ビデオの異常検出は非常に困難な作業になります。既存の方法は、ほとんど再構成または将来のフレーム予測モードに従います。ただし、これらの方法では、サンプルの外観と動きの情報の一貫性が無視されるため、異常検出のパフォーマンスが制限されます。異常は監視ビデオの移動する前景でのみ発生するため、異常検出で背景情報のないビデオフレームシーケンスとオプティカルフローによって表現されるセマンティクスは、異常検出にとって非常に一貫性があり、重要である必要があります。このアイデアに基づいて、異常検出を処理するために通常のデータの外観とモーションセマンティック表現の一貫性を使用するフレームワークである外観-モーションセマンティクス表現の一貫性（AMSRC）を提案します。まず、正常なサンプルの外観と動きの情報表現をエンコードする2ストリームエンコーダーを設計し、制約を導入して、正常なサンプルの外観と動きの情報間の特徴セマンティクスの一貫性をさらに高め、外観と動きの一貫性が低い異常なサンプルを作成します。特徴表現を識別できます。さらに、異常なサンプルの外観と動きの特徴の一貫性が低いため、再構成エラーが大きい予測フレームを生成できます。これにより、異常を見つけやすくなります。実験結果は提案された方法の有効性を示している。

Video anomaly detection refers to the identification of events that deviate from the expected behavior. Due to the lack of anomalous samples in training, video anomaly detection becomes a very challenging task. Existing methods almost follow a reconstruction or future frame prediction mode. However, these methods ignore the consistency between appearance and motion information of samples, which limits their anomaly detection performance. Anomalies only occur in the moving foreground of surveillance videos, so the semantics expressed by video frame sequences and optical flow without background information in anomaly detection should be highly consistent and significant for anomaly detection. Based on this idea, we propose Appearance-Motion Semantics Representation Consistency (AMSRC), a framework that uses normal data's appearance and motion semantic representation consistency to handle anomaly detection. Firstly, we design a two-stream encoder to encode the appearance and motion information representations of normal samples and introduce constraints to further enhance the consistency of the feature semantics between appearance and motion information of normal samples so that abnormal samples with low consistency appearance and motion feature representation can be identified. Moreover, the lower consistency of appearance and motion features of anomalous samples can be used to generate predicted frames with larger reconstruction error, which makes anomalies easier to spot. Experimental results demonstrate the effectiveness of the proposed method.

updated: Fri Apr 08 2022 15:59:57 GMT+0000 (UTC)

published: Fri Apr 08 2022 15:59:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト