Semantic Driven Multi-Camera Pedestrian Detection

Alejandro López-Cifuentes; Marcos Escudero-Viñolo; Jesús Bescós; Pablo Carballeira

セマンティック駆動のマルチカメラ歩行者検出

現在の世界的な状況では、歩行者の検出は、歩行者の追跡、社会的距離の監視、歩行者の質量のカウントなどのタスクを解決することを目的としたインテリジェントなビデオベースのシステムの中心的なツールとして再び登場しました。歩行者の検出方法は、パフォーマンスが最も高い方法でさえ、歩行者の閉塞に非常に敏感であり、混雑したシナリオでのパフォーマンスを劇的に低下させます。マルチカメラセットアップの一般化により、さまざまな視点からの情報を組み合わせることで、オクルージョンにうまく立ち向かうことができます。この論文では、自動的に抽出されたシーンコンテキストを活用して、歩行者の検出をグローバルに組み合わせるマルチカメラアプローチを紹介します。最先端の方法の大部分とは対照的に、提案されたアプローチはシーンにとらわれず、たとえば微調整を介して、ターゲットシナリオ\ textemdashへの調整された適応を必要としません。この注目に値する属性は、ラベル付けされたデータを使用したアドホックトレーニングを必要とせず、実際の状況で提案された方法の展開を促進します。セマンティックセグメンテーションを介して取得されたコンテキスト情報は、1）シーンとすべてのカメラに共通の関心領域を自動的に生成するために使用され、手動で定義する通常の必要性を回避します。 2）各2D画像と3Dシーンの両方で検出のコヒーレンスを最大化する大域的最適化問題を解くことにより、各カメラの検出を取得します。このプロセスにより、オクルージョンや誤検出を回避する、ぴったりとフィットするバウンディングボックスが生成されます。 5つの公開されているデータセットの実験結果は、提案されたアプローチが最先端のマルチカメラ歩行者検出器よりも優れていることを示しています。ターゲットシナリオで特別にトレーニングされたものもあり、アドホックアノテーションを必要とせずに提案された方法の多様性と堅牢性を示しています。人間が誘導する構成でもありません。

In the current worldwide situation, pedestrian detection has reemerged as a pivotal tool for intelligent video-based systems aiming to solve tasks such as pedestrian tracking, social distancing monitoring or pedestrian mass counting. Pedestrian detection methods, even the top performing ones, are highly sensitive to occlusions among pedestrians, which dramatically degrades their performance in crowded scenarios. The generalization of multi-camera set-ups permits to better confront occlusions by combining information from different viewpoints. In this paper, we present a multi-camera approach to globally combine pedestrian detections leveraging automatically extracted scene context. Contrarily to the majority of the methods of the state-of-the-art, the proposed approach is scene-agnostic, not requiring a tailored adaptation to the target scenario\textemdash e.g., via fine-tunning. This noteworthy attribute does not require ad hoc training with labelled data, expediting the deployment of the proposed method in real-world situations. Context information, obtained via semantic segmentation, is used 1) to automatically generate a common Area of Interest for the scene and all the cameras, avoiding the usual need of manually defining it; and 2) to obtain detections for each camera by solving a global optimization problem that maximizes coherence of detections both in each 2D image and in the 3D scene. This process yields tightly-fitted bounding boxes that circumvent occlusions or miss-detections. Experimental results on five publicly available datasets show that the proposed approach outperforms state-of-the-art multi-camera pedestrian detectors, even some specifically trained on the target scenario, signifying the versatility and robustness of the proposed method without requiring ad-hoc annotations nor human-guided configuration.

updated: Thu Mar 04 2021 09:49:31 GMT+0000 (UTC)

published: Thu Dec 27 2018 17:36:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト