A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video

Mariana-Iuliana Georgescu; Radu Tudor Ionescu; Fahad Shahbaz Khan; Marius Popescu; Mubarak Shah

ビデオでの異常なイベント検出のための敵対的トレーニングを備えた背景にとらわれないフレームワーク

ビデオでの異常なイベント検出は、近年大きな注目を集めている複雑なコンピュータビジョンの問題です。タスクの複雑さは、異常なイベント、つまり、通常は周囲のコンテキストに依存するまれに発生するイベントの一般的に採用されている定義から生じます。外れ値検出としての異常イベント検出の標準的な定式化に従って、正常なイベントのみを含むトレーニングビデオから学習する背景にとらわれないフレームワークを提案します。私たちのフレームワークは、オブジェクト検出器、外観と動きのオートエンコーダーのセット、および分類器のセットで構成されています。私たちのフレームワークはオブジェクト検出のみを対象としているため、通常のイベントがシーン全体で同じように定義され、変動の唯一の主な要因が背景である場合、さまざまなシーンに適用できます。トレーニング中の異常なデータの不足を克服するために、オートエンコーダーの敵対的な学習戦略を提案します。ドメイン外の疑似異常の例のシーンにとらわれないセットを作成します。これらは、疑似異常の例に勾配上昇を適用する前に、オートエンコーダーによって正しく再構築されます。さらに、疑似異常の例を利用して、外観ベースおよびモーションベースのバイナリ分類器をトレーニングする際の異常な例として機能し、正常な潜在的特徴と異常な潜在的特徴および再構成を区別します。さまざまな評価指標を使用して、4つのベンチマークデータセットでフレームワークを最先端の方法と比較します。既存の方法と比較して、経験的な結果は、私たちのアプローチがすべてのデータセットで好ましいパフォーマンスを達成することを示しています。さらに、文献からの2つの大規模な異常イベント検出データセット、つまりShanghaiTechとSubwayに対して、地域ベースとトラックベースの注釈を提供します。

Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a set of classifiers. Since our framework only looks at object detections, it can be applied to different scenes, provided that normal events are defined identically across scenes and that the single main factor of variation is the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain pseudo-abnormal examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the pseudo-abnormal examples. We further utilize the pseudo-abnormal examples to serve as abnormal examples when training appearance-based and motion-based binary classifiers to discriminate between normal and abnormal latent features and reconstructions. We compare our framework with the state-of-the-art methods on four benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets. In addition, we provide region-based and track-based annotations for two large-scale abnormal event detection data sets from the literature, namely ShanghaiTech and Subway.

updated: Mon May 10 2021 14:47:29 GMT+0000 (UTC)

published: Thu Aug 27 2020 18:39:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト