Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning

Yu Tian; Guansong Pang; Yuanhong Chen; Rajvinder Singh; Johan W. Verjans; Gustavo Carneiro

堅牢な時間的特徴の大きさの学習による弱教師ありビデオ異常検出

弱く監視されたビデオレベルのラベルによる異常検出は、通常、複数インスタンス学習（MIL）問題として定式化されます。この問題では、異常イベントを含むスニペットを識別し、各ビデオをビデオスニペットのバッグとして表します。現在の方法は効果的な検出パフォーマンスを示していますが、特に異常なイベントが通常と比較してわずかな違いしか示さない微妙な異常である場合、ポジティブインスタンス、つまり異常なビデオ内のまれな異常なスニペットの認識は、支配的なネガティブインスタンスによって大きく偏っています。イベント。この問題は、重要なビデオの時間的依存関係を無視する多くの方法で悪化します。この問題に対処するために、Robust Temporal Feature Magnitude Learning（RTFM）という名前の新しい理論的に適切な方法を導入します。これは、特徴の大きさの学習関数をトレーニングして、正のインスタンスを効果的に認識し、負のインスタンスに対するMILアプローチの堅牢性を大幅に向上させます。異常なビデオから。 RTFMはまた、拡張畳み込みと自己注意メカニズムを適応させて、長距離と短距離の時間依存性をキャプチャし、特徴の大きさをより忠実に学習します。広範な実験により、RTFM対応のMILモデルは、（i）3つのベンチマークデータセット（ShanghaiTech、UCF-Crime、XD-Violence）で、いくつかの最先端の方法を大幅に上回り、（ii）大幅に改善されたことが示されています。微妙な異常の識別可能性とサンプル効率。コードはhttps://github.com/tianyu0207/RTFMで入手できます。

Anomaly detection with weakly supervised video-level labels is typically formulated as a multiple instance learning (MIL) problem, in which we aim to identify snippets containing abnormal events, with each video represented as a bag of video snippets. Although current methods show effective detection performance, their recognition of the positive instances, i.e., rare abnormal snippets in the abnormal videos, is largely biased by the dominant negative instances, especially when the abnormal events are subtle anomalies that exhibit only small differences compared with normal events. This issue is exacerbated in many methods that ignore important video temporal dependencies. To address this issue, we introduce a novel and theoretically sound method, named Robust Temporal Feature Magnitude learning (RTFM), which trains a feature magnitude learning function to effectively recognise the positive instances, substantially improving the robustness of the MIL approach to the negative instances from abnormal videos. RTFM also adapts dilated convolutions and self-attention mechanisms to capture long- and short-range temporal dependencies to learn the feature magnitude more faithfully. Extensive experiments show that the RTFM-enabled MIL model (i) outperforms several state-of-the-art methods by a large margin on three benchmark data sets (ShanghaiTech, UCF-Crime and XD-Violence) and (ii) achieves significantly improved subtle anomaly discriminability and sample efficiency. Code is available at https://github.com/tianyu0207/RTFM.

updated: Sat Mar 20 2021 11:44:05 GMT+0000 (UTC)

published: Mon Jan 25 2021 12:04:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト