SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

Yu-Hsiang Wang; Jun-Wei Hsieh; Ping-Yang Chen; Ming-Ching Chang; Hung Hin So; Xin Li

SMILEtrack: オクルージョンを認識した複数のオブジェクト追跡のための SiMIlarity LEarning

複数オブジェクト追跡 (MOT) の最近の進歩にも関わらず、オクルージョン、類似オブジェクト、複雑なシーンなどのいくつかの障害は未解決の課題のままです。一方、一般的な検出による追跡パラダイムのコストパフォーマンスのトレードオフに関する体系的な研究はまだ不足しています。このペーパーでは、効率的なオブジェクト検出器とシャムネットワークベースの類似性学習モジュール (SLM) を統合することで、これらの課題に効果的に対処する革新的なオブジェクトトラッカーである SMILEtrack を紹介します。 SMILETrack の技術的貢献は 2 つあります。まず、Separate Detection and Embedding (SDE) モデルの特徴記述子の制限を克服して、2 つのオブジェクト間の外観の類似性を計算する SLM を提案します。 SLM には、ビジョン Transformer からインスピレーションを得たパッチセルフアテンション (PSA) ブロックが組み込まれており、正確な類似性マッチングのための信頼できる機能を生成します。次に、連続するビデオフレーム間で堅牢なオブジェクトマッチングを実現する新しい GATE 機能を備えた類似性マッチングカスケード (SMC) モジュールを開発し、MOT パフォーマンスをさらに強化します。これらのイノベーションにより、SMILETrack は、人気のある BYTETrack 手法を含むいくつかの既存の最先端ベンチマークと比較して、コスト (実行速度など) とパフォーマンス (追跡精度など) との間のトレードオフを改善することができます。 SMILETrack は、MOT17 および MOT20 データセットで BYTETrack よりも 0.4 ～ 0.8 MOTA ポイントおよび 2.1 ～ 2.2 HOTA ポイント優れています。コードは https://github.com/pingyang1117/SMILEtrack_Official で入手できます。

Despite recent progress in Multiple Object Tracking (MOT), several obstacles such as occlusions, similar objects, and complex scenes remain an open challenge. Meanwhile, a systematic study of the cost-performance tradeoff for the popular tracking-by-detection paradigm is still lacking. This paper introduces SMILEtrack, an innovative object tracker that effectively addresses these challenges by integrating an efficient object detector with a Siamese network-based Similarity Learning Module (SLM). The technical contributions of SMILETrack are twofold. First, we propose an SLM that calculates the appearance similarity between two objects, overcoming the limitations of feature descriptors in Separate Detection and Embedding (SDE) models. The SLM incorporates a Patch Self-Attention (PSA) block inspired by the vision Transformer, which generates reliable features for accurate similarity matching. Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance. Together, these innovations help SMILETrack achieve an improved trade-off between the cost ( e.g., running speed) and performance (e.g., tracking accuracy) over several existing state-of-the-art benchmarks, including the popular BYTETrack method. SMILETrack outperforms BYTETrack by 0.4-0.8 MOTA and 2.1-2.2 HOTA points on MOT17 and MOT20 datasets. Code is available at https://github.com/pingyang1117/SMILEtrack_Official

updated: Mon Jan 22 2024 06:46:27 GMT+0000 (UTC)

published: Wed Nov 16 2022 10:49:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト