Multi-Template Temporal Siamese Network for Long-Term Object Tracking

Ali Sekhavati; Won-Sook Lee

長期オブジェクト追跡のためのマルチテンプレート時間シャムネットワーク

シャムネットワークは、ターゲットが十分に識別されている限り、高速で高精度の追跡機能を備えた最も一般的な視覚オブジェクト追跡方法の 1 つです。ただし、ほとんどのシャムネットワークベースのトラッカーは、最初のフレームをオブジェクトのグラウンドトゥルースとして使用し、次のフレームでターゲットの外観が大幅に変化すると失敗します。また、ターゲットをフレーム内の同様の他のオブジェクトと区別することも困難です。両方の問題を解決するための 2 つのアイデアを提案します。最初のアイデアは、多様で類似した最近のターゲット機能を含む動的テンプレートのバッグを使用し、多様なターゲットの外観で継続的に更新することです。もう 1 つのアイデアは、ネットワークに経路履歴を学習させ、次のフレームで潜在的な将来のターゲット位置を投影させることです。このトラッカーは、最先端の方法である HiFT と比較して、成功率を 15% (65.4 対 56.6) という大幅に改善することで、長期追跡データセット UAV20L で最先端のパフォーマンスを実現します。この論文の重要な python コードは公開されています。

Siamese Networks are one of most popular visual object tracking methods for their high speed and high accuracy tracking ability as long as the target is well identified. However, most Siamese Network based trackers use the first frame as the ground truth of an object and fail when target appearance changes significantly in next frames. They also have dif iculty distinguishing the target from similar other objects in the frame. We propose two ideas to solve both problems. The first idea is using a bag of dynamic templates, containing diverse, similar, and recent target features and continuously updating it with diverse target appearances. The other idea is to let a network learn the path history and project a potential future target location in a next frame. This tracker achieves state-of-the-art performance on the long-term tracking dataset UAV20L by improving the success rate by a large margin of 15% (65.4 vs 56.6) compared to the state-of-the-art method, HiFT. The of icial python code of this paper is publicly available.

updated: Thu Nov 24 2022 22:07:33 GMT+0000 (UTC)

published: Thu Nov 24 2022 22:07:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト