Tracking Passengers and Baggage Items using Multiple Overhead Cameras at Security Checkpoints

Abubakar Siddique; Henry Medeiros

保安検査場で複数のオーバーヘッドカメラを使用して乗客と荷物を追跡

ターゲットが乗客とその荷物に対応する空港のチェックポイントのセキュリティシナリオのオーバーヘッドカメラビデオで複数のオブジェクトを追跡するための新しいフレームワークを紹介します。オーバーヘッド画像からインスタンスセグメンテーションの不確実性に関するモデル情報を提供する自己教師あり学習 (SSL) 手法を提案します。私たちの SSL アプローチは、テスト時のデータ拡張と、回帰ベースの回転不変の疑似ラベル改良技術を採用することで、オブジェクト検出を改善します。私たちの疑似ラベル生成方法は、複数の幾何学的に変換された画像を畳み込みニューラルネットワーク (CNN) への入力として提供し、ネットワークによって生成された拡張検出を回帰させてローカリゼーションエラーを減らし、平均シフトアルゴリズムを使用してそれらをクラスター化します。自己教師あり検出器モデルは、単一カメラ追跡アルゴリズムで使用され、ターゲットの一時的な識別子を生成します。私たちの方法には、マルチビュー軌道関連付けメカニズムも組み込まれており、乗客がカメラビューを横切って移動するときに一貫した一時的な識別子を維持します。現実的な空港の検問所環境で複数のオーバーヘッドカメラから取得したビデオの検出、追跡、および関連付けのパフォーマンスの評価は、提案されたアプローチの有効性を示しています。私たちの結果は、自己監視により、モデルの推論時間を増やすことなく、オブジェクト検出の精度が最大 42% 向上することを示しています。当社のマルチカメラアソシエーションメソッドは、15 ミリ秒未満の平均計算時間で最大 89% のマルチオブジェクトトラッキング精度を達成します。

We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios where targets correspond to passengers and their baggage items. We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images. Our SSL approach improves object detection by employing a test-time data augmentation and a regression-based, rotation-invariant pseudo-label refinement technique. Our pseudo-label generation method provides multiple geometrically-transformed images as inputs to a Convolutional Neural Network (CNN), regresses the augmented detections generated by the network to reduce localization errors, and then clusters them using the mean-shift algorithm. The self-supervised detector model is used in a single-camera tracking algorithm to generate temporal identifiers for the targets. Our method also incorporates a multi-view trajectory association mechanism to maintain consistent temporal identifiers as passengers travel across camera views. An evaluation of detection, tracking, and association performances on videos obtained from multiple overhead cameras in a realistic airport checkpoint environment demonstrates the effectiveness of the proposed approach. Our results show that self-supervision improves object detection accuracy by up to 42% without increasing the inference time of the model. Our multi-camera association method achieves up to 89% multi-object tracking accuracy with an average computation time of less than 15 ms.

updated: Sat Dec 31 2022 12:57:09 GMT+0000 (UTC)

published: Sat Dec 31 2022 12:57:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト