Unifying Short and Long-Term Tracking with Graph Hierarchies

Orcun Cetintas; Guillem Brasó; Laura Leal-Taixé

グラフ階層による短期追跡と長期追跡の統合

長いビデオでオブジェクトを効果的に追跡するということは、遮られていないオブジェクトの短期的な関連付けから、遮られてシーンに再び現れるオブジェクトの長期的な関連付けまで、さまざまな問題を解決することを意味します。これら 2 つのタスクに取り組む方法は、多くの場合、ばらばらで、特定のシナリオ向けに作成されています。パフォーマンスの高いアプローチは、多くの場合、技術の組み合わせであり、一般性に欠けるエンジニアリングに重きを置いたソリューションを生み出します。この作業では、ハイブリッドアプローチの必要性を疑問視し、統合されたスケーラブルなマルチオブジェクトトラッカーである SUSHI を紹介します。私たちのアプローチは、長いクリップをサブクリップの階層に分割することで処理します。これにより、高いスケーラビリティが可能になります。グラフニューラルネットワークを活用して、階層のすべてのレベルを処理します。これにより、モデルが時間スケール全体で統一され、非常に一般的になります。その結果、4 つの多様なデータセットで最先端の技術を大幅に改善することができました。私たちのコードとモデルは、bit.ly/sushi-mo で入手できます。

Tracking objects over long videos effectively means solving a spectrum of problems, from short-term association for un-occluded objects to long-term association for objects that are occluded and then reappear in the scene. Methods tackling these two tasks are often disjoint and crafted for specific scenarios, and top-performing approaches are often a mix of techniques, which yields engineering-heavy solutions that lack generality. In this work, we question the need for hybrid approaches and introduce SUSHI, a unified and scalable multi-object tracker. Our approach processes long clips by splitting them into a hierarchy of subclips, which enables high scalability. We leverage graph neural networks to process all levels of the hierarchy, which makes our model unified across temporal scales and highly general. As a result, we obtain significant improvements over state-of-the-art on four diverse datasets. Our code and models are available at bit.ly/sushi-mot.

updated: Thu Mar 30 2023 13:47:25 GMT+0000 (UTC)

published: Tue Dec 06 2022 15:12:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト