Sensing Anomalies like Humans: A Hominine Framework to Detect Abnormal Events from Unlabeled Videos

Siqi Wang; Guang Yu; Zhiping Cai; En Zhu; Xinwang Liu; Jianping Yin; Chengzhang Zhu

人間のような異常の検知：ラベルのないビデオから異常なイベントを検出するためのHominineフレームワーク

ビデオ異常検出（VAD）は、ビデオ分析において常に重要なトピックです。異常はまれであることが多いため、通常は半教師あり設定で対処します。これには、純粋な通常のビデオを使用したトレーニングセットが必要です。使い果たされた手動ラベル付けを回避するために、人間が異常をどのように感知するかに触発され、教師なしVADとエンドツーエンドVADの両方を可能にする均質なフレームワークを提案します。このフレームワークは、2つの重要な観察に基づいています。1）人間の知覚は通常局所的です。つまり、異常を感知するときに局所的な前景とそのコンテキストに焦点を合わせます。したがって、一般的な知識で前景をローカライズすることによってローカリティ認識を課すことを提案し、リージョンローカリゼーション戦略はローカルコンテキストを活用するように設計されています。 2）頻繁に発生するイベントは、人間の正常性の定義を形成し、代理トレーニングパラダイムを考案する動機になります。ディープニューラルネットワーク（DNN）をトレーニングして、ラベルのないビデオを使用した代理タスクを学習します。頻繁に発生するイベントは、DNNの「成形」において主要な役割を果たします。このようにして、トレーニング損失のギャップは、めったに見られない新しいイベントを異常として自動的に明らかにします。実装のために、さまざまな代理タスクと、従来のDNNモデルと新しいDNNモデルの両方を調査します。一般的に使用されるVADベンチマークの広範な評価は、さまざまな代理タスクまたはDNNモデルへのフレームワークの適用性を正当化し、その驚くべき有効性を示します。これは、既存の教師なしソリューションを大幅に上回っているだけでなく（8％から10％のAUROCゲイン）も達成します。最先端の半教師あり対応製品に匹敵する、またはさらに優れたパフォーマンス。

Video anomaly detection (VAD) has constantly been a vital topic in video analysis. As anomalies are often rare, it is typically addressed under a semi-supervised setup, which requires a training set with pure normal videos. To avoid exhausted manual labeling, we are inspired by how humans sense anomalies and propose a hominine framework that enables both unsupervised and end-to-end VAD. The framework is based on two key observations: 1) Human perception is usually local, i.e. focusing on local foreground and its context when sensing anomalies. Thus, we propose to impose locality-awareness by localizing foreground with generic knowledge, and a region localization strategy is designed to exploit local context. 2) Frequently-occurred events will mould humans' definition of normality, which motivates us to devise a surrogate training paradigm. It trains a deep neural network (DNN) to learn a surrogate task with unlabeled videos, and frequently-occurred events will play a dominant role in "moulding" the DNN. In this way, a training loss gap will automatically manifest rarely-seen novel events as anomalies. For implementation, we explore various surrogate tasks as well as both classic and emerging DNN models. Extensive evaluations on commonly-used VAD benchmarks justify the framework's applicability to different surrogate tasks or DNN models, and demonstrate its astonishing effectiveness: It not only outperforms existing unsupervised solutions by a wide margin (8% to 10% AUROC gain), but also achieves comparable or even superior performance to state-of-the-art semi-supervised counterparts.

updated: Wed Aug 04 2021 11:31:57 GMT+0000 (UTC)

published: Wed Aug 04 2021 11:31:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト