Robust Unsupervised Multi-Object Tracking in Noisy Environments

C. -H. Huck Yang; Mohit Chhabra; Y. -C. Liu; Quan Kong; Tomoaki Yoshinaga; Tomokazu Murakami

ノイズの多い環境でのロバストな教師なしマルチオブジェクトトラッキング

物理的なプロセス、カメラの動き、ほこりの存在などの予測できない環境条件は、ビデオフィードにノイズやアーティファクトを引き起こす可能性があります。一般的な教師なしMOT法は、ノイズのない入力に依存していることがわかります。少量の人工ランダムノイズを追加すると、ベンチマークメトリックのモデルパフォーマンスが急激に低下することを示します。この問題は、堅牢な教師なしマルチオブジェクト追跡（MOT）モデルであるAttU-Netを導入することで解決します。提案されたシングルヘッドアテンションモデルは、さまざまなセグメントスケールで視覚的表現を学習することにより、ノイズの悪影響を制限するのに役立ちます。 AttU-Netは、変分推論ベースの最先端のベースラインよりも優れた教師なしMOT追跡パフォーマンスを示します。 MNIST-MOTとAtariゲームビデオベンチマークでメソッドを評価します。また、日本語の文字を動かす「Kuzushiji-MNIST MOT」と、MOTモデルの有効性を検証するための「Fashion-MNISTMOT」の2つの拡張ビデオデータセットも提供しています。

Physical processes, camera movement, and unpredictable environmental conditions like the presence of dust can induce noise and artifacts in video feeds. We observe that popular unsupervised MOT methods are dependent on noise-free inputs. We show that the addition of a small amount of artificial random noise causes a sharp degradation in model performance on benchmark metrics. We resolve this problem by introducing a robust unsupervised multi-object tracking (MOT) model: AttU-Net. The proposed single-head attention model helps limit the negative impact of noise by learning visual representations at different segment scales. AttU-Net shows better unsupervised MOT tracking performance over variational inference-based state-of-the-art baselines. We evaluate our method in the MNIST-MOT and the Atari game video benchmark. We also provide two extended video datasets: ``Kuzushiji-MNIST MOT'' which consists of moving Japanese characters and ``Fashion-MNIST MOT'' to validate the effectiveness of the MOT models.

updated: Tue Jun 15 2021 06:52:21 GMT+0000 (UTC)

published: Thu May 20 2021 19:38:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト