Simple Cues Lead to a Strong Multi-Object Tracker

Jenny Seidenschwarz; Guillem Brasó; Victor Castro Serrano; Ismail Elezi; Laura Leal-Taixé

シンプルな手がかりが強力なマルチオブジェクトトラッカーにつながる

長い間、マルチオブジェクトトラッキングの最も一般的なパラダイムは、オブジェクトが最初に検出され、次にビデオフレームに関連付けられる、検出による追跡 (TbD) でした。関連付けのために、ほとんどのモデルは、動きと外観の手がかり (再識別ネットワークなど) を利用しています。注意に基づく最近のアプローチは、データ駆動型の方法でキューを学習することを提案しており、印象的な結果を示しています。このホワイトペーパーでは、シンプルで古き良き TbD メソッドでもエンドツーエンドモデルのパフォーマンスを達成できるかどうかを自問します。この目的のために、標準的な再識別ネットワークが外観ベースの追跡に優れていることを可能にする 2 つの重要な要素を提案します。その失敗例を広範囲に分析し、外観機能と単純なモーションモデルを組み合わせることで、強力な追跡結果が得られることを示しています。私たちのトラッカーは、MOT17、MOT20、BDD100k、DanceTrack という 4 つの公開データセットに一般化され、最先端のパフォーマンスを実現しています。コードとモデルを公開します

For a long time, the most common paradigm in Multi-Object Tracking was tracking-by-detection (TbD), where objects are first detected and then associated over video frames. For association, most models resourced to motion and appearance cues, e.g., re-identification networks. Recent approaches based on attention propose to learn the cues in a data-driven manner, showing impressive results. In this paper, we ask ourselves whether simple good old TbD methods are also capable of achieving the performance of end-to-end models. To this end, we propose two key ingredients that allow a standard re-identification network to excel at appearance-based tracking. We extensively analyse its failure cases, and show that a combination of our appearance features with a simple motion model leads to strong tracking results. Our tracker generalizes to four public datasets, namely MOT17, MOT20, BDD100k, and DanceTrack, achieving state-of-the-art performance. We will release the code and models

updated: Thu Dec 01 2022 18:46:03 GMT+0000 (UTC)

published: Thu Jun 09 2022 17:55:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト