Video Crowd Localization with Multi-focus Gaussian Neighbor Attention and a Large-Scale Benchmark

Haopeng Li; Lingbo Liu; Kunlin Yang; Shinan Liu; Junyu Gao; Bin Zhao; Rui Zhang; Jun Hou

マルチフォーカスガウスネイバーアテンションと大規模ベンチマークによるビデオ群集のローカリゼーション

ビデオ群集のローカリゼーションは、特定の混雑したビデオ内の人間の頭の正確な位置を推定することを目的とした、重要でありながら困難なタスクです。人間の移動性の時空間依存性をモデル化するために、入力ビデオの空間トポロジ構造を維持しながら、長距離対応を効果的に活用できるマルチフォーカスガウスネイバーアテンション（GNA）を提案します。特に、GNAは、装備されたマルチフォーカスメカニズムを使用して、人間の頭のスケール変化をうまくキャプチャすることもできます。マルチフォーカスGNAに基づいて、シーンモデリングモジュールとコンテキストクロスアテンションモジュールを介して時空間情報を完全に集約することにより、ビデオクリップ内のヘッドセンターを正確に特定するGNANetと呼ばれる統合ニューラルネットワークを開発します。さらに、この分野での将来の研究を容易にするために、SenseCrowdという名前の大規模な混雑したビデオベンチマークを導入します。これは、さまざまな監視シナリオでキャプチャされた60K以上のフレームと2M以上のヘッドアノテーションで構成されます。最後に、SenseCrowdを含む3つのデータセットで広範な実験を行い、実験結果は、提案された方法がビデオ群集のローカリゼーションとカウントの両方で最先端のパフォーマンスを達成できることを示しています。コードとデータセットがリリースされます。

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos. To model spatial-temporal dependencies of human mobility, we propose a multi-focus Gaussian neighbor attention (GNA), which can effectively exploit long-range correspondences while maintaining the spatial topological structure of the input videos. In particular, our GNA can also capture the scale variation of human heads well using the equipped multi-focus mechanism. Based on the multi-focus GNA, we develop a unified neural network called GNANet to accurately locate head centers in video clips by fully aggregating spatial-temporal information via a scene modeling module and a context cross-attention module. Moreover, to facilitate future researches in this field, we introduce a large-scale crowded video benchmark named SenseCrowd, which consists of 60K+ frames captured in various surveillance scenarios and 2M+ head annotations. Finally, we conduct extensive experiments on three datasets including our SenseCrowd, and the experiment results show that the proposed method is capable to achieve state-of-the-art performance for both video crowd localization and counting. The code and the dataset will be released.

updated: Tue Jul 20 2021 01:46:08 GMT+0000 (UTC)

published: Mon Jul 19 2021 06:59:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト