Cross-Domain Video Anomaly Detection without Target Domain Adaptation

Abhishek Aich; Kuan-Chuan Peng; Amit K. Roy-Chowdhury

ターゲットドメイン適応なしのクロスドメインビデオ異常検出

ほとんどのクロスドメインの教師なしビデオ異常検出 (VAD) 作業では、ソースドメインからターゲットドメインへの適応に使用できるタスク関連のターゲットドメイントレーニングデータが少なくとも少数あることを前提としています。しかし、これには、「箱から出して」動作するシステムを好むエンドユーザーによる骨の折れるモデル調整が必要です。そのような実用的なシナリオに対処するために、新しいターゲットドメイン (推論時間) を特定します。ターゲットドメインのトレーニングデータが利用できない VAD タスク. この目的のために、将来のフレーム予測生成モデルのセットアップを含む新しい「ゼロショットクロスドメインビデオ異常検出 (zxvad)」フレームワークを提案します. 以前の未来とは異なります-フレーム予測モデルと同様に、私たちのモデルは新しい正常性分類子モジュールを使用して、正常なイベントビデオの特徴が疑似異常例の特徴と「相対的に」どのように異なるかを学習することによって、正常なイベントビデオの特徴を学習します。新しい訓練されていない畳み込みニューラルネットワークベースの異常合成モジュールは、追加の訓練コストなしで通常のビデオフレームに異物を追加することで、これらの疑似異常例を作成します。 zxvad は、新しい相対正常性機能学習戦略を使用して一般化し、推論中に適応することなく、新しいターゲットドメインで正常なフレームと異常なフレームを区別することを学習します。一般的なデータセットの評価を通じて、タスク関連 (つまり、VAD) のソーストレーニングデータが利用可能かどうかに関係なく、zxvad が最先端 (SOTA) よりも優れていることを示します。最後に、zxvad は、モデルサイズ、合計パラメーター、GPU エネルギー消費、および GMAC を含む推論時間効率メトリックでも SOTA メソッドを上回っています。

Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inference-time) VAD task where no target domain training data are available. To this end, we propose a new `Zero-shot Cross-domain Video Anomaly Detection (zxvad)' framework that includes a future-frame prediction generative model setup. Different from prior future-frame prediction models, our model uses a novel Normalcy Classifier module to learn the features of normal event videos by learning how such features are different ``relatively" to features in pseudo-abnormal examples. A novel Untrained Convolutional Neural Network based Anomaly Synthesis module crafts these pseudo-abnormal examples by adding foreign objects in normal video frames with no extra training cost. With our novel relative normalcy feature learning strategy, zxvad generalizes and learns to distinguish between normal and abnormal frames in a new target domain without adaptation during inference. Through evaluations on common datasets, we show that zxvad outperforms the state-of-the-art (SOTA), regardless of whether task-relevant (i.e., VAD) source training data are available or not. Lastly, zxvad also beats the SOTA methods in inference-time efficiency metrics including the model size, total parameters, GPU energy consumption, and GMACs.

updated: Wed Dec 14 2022 03:48:00 GMT+0000 (UTC)

published: Wed Dec 14 2022 03:48:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト