arXiv reaDer
From Explicit Rules to Implicit Reasoning in an Interpretable Violence Monitoring System
Recently, research based on pre-trained models has demonstrated outstanding performance in violence surveillance tasks. However, these black-box systems face challenges regarding explainability during training and inference processes. An important question is how to incorporate explicit knowledge into these implicit models, thereby designing expert-driven and interpretable violence surveillance systems. This paper proposes a new paradigm for weakly supervised violence monitoring (WSVM) called Rule base Violence monitoring (RuleVM). The proposed RuleVM uses a dual-branch structure for different designs for images and text. One of the branches is called the implicit branch, which uses only visual features for coarse-grained binary classification. In this branch, image feature extraction is divided into two channels: one responsible for extracting scene frames and the other focusing on extracting actions. The other branch is called the explicit branch, which utilizes language-image alignment to perform fine-grained classification. For the language channel design in the explicit branch, the proposed RuleCLIP uses the state-of-the-art YOLO-World model to detect objects and actions in video frames, and association rules are identified through data mining methods as descriptions of the video. Leveraging the dual-branch architecture, RuleVM achieves interpretable coarse-grained and fine-grained violence surveillance. Extensive experiments were conducted on two commonly used benchmarks, and the results show that RuleCLIP achieved the best performance in both coarse-grained and fine-grained detection, significantly outperforming existing state-of-the-art methods. Moreover, interpretability experiments uncovered some interesting rules, such as the observation that as the number of people increases, the risk level of violent behavior also rises.
updated: Sat Nov 02 2024 06:33:29 GMT+0000 (UTC)
published: Tue Oct 29 2024 12:22:07 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト