Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model

Hamid Mohammadi; Ehsan Nazerfard

半教師付きハード注意モデルを使用したビデオ暴力の認識とローカリゼーション

監視カメラネットワークの大幅な成長には、これらのネットワークによって生成される大量のビデオデータを効率的に分析するためのスケーラブルな AI ソリューションが必要です。監視映像に対して実行される典型的な分析として、ビデオによる暴力の検出が最近大きな注目を集めています。研究の大部分は、教師ありの方法を使用して既存の方法を改善することに焦点を当てており、半教師ありの学習アプローチにはほとんど注意が向けられていません。この研究では、半教師ありアプローチによって既存のモデルを上回ることができる強化学習モデルが導入されています。提案された方法の主な目新しさは、半教師付きハードアテンションメカニズムの導入にあります。細心の注意を払って、ビデオの重要な領域が特定され、データの有益でない部分から分離されます。冗長なデータを削除し、より高い解像度で有用な視覚情報に焦点を当てることで、モデルの精度が向上します。半教師付き強化学習アルゴリズムを使用してハードアテンションメカニズムを実装すると、ビデオ暴力データセットでアテンションアノテーションが不要になり、すぐに適用できるようになります。提案されたモデルは、事前トレーニング済みの I3D バックボーンを利用して、トレーニングプロセスを加速および安定化します。提案されたモデルは、RWF とホッケーのデータセットでそれぞれ 90.4% と 98.7% の最先端の精度を達成しました。

The significant growth of surveillance camera networks necessitates scalable AI solutions to efficiently analyze the large amount of video data produced by these networks. As a typical analysis performed on surveillance footage, video violence detection has recently received considerable attention. The majority of research has focused on improving existing methods using supervised methods, with little, if any, attention to the semi-supervised learning approaches. In this study, a reinforcement learning model is introduced that can outperform existing models through a semi-supervised approach. The main novelty of the proposed method lies in the introduction of a semi-supervised hard attention mechanism. Using hard attention, the essential regions of videos are identified and separated from the non-informative parts of the data. A model's accuracy is improved by removing redundant data and focusing on useful visual information in a higher resolution. Implementing hard attention mechanisms using semi-supervised reinforcement learning algorithms eliminates the need for attention annotations in video violence datasets, thus making them readily applicable. The proposed model utilizes a pre-trained I3D backbone to accelerate and stabilize the training process. The proposed model achieved state-of-the-art accuracy of 90.4% and 98.7% on RWF and Hockey datasets, respectively.

updated: Mon Sep 05 2022 20:09:05 GMT+0000 (UTC)

published: Fri Feb 04 2022 16:15:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト