Self-Supervised Tracking via Target-Aware Data Synthesis

Xin Li; Wenjie Pei; Zikun Zhou; Zhenyu He; Huchuan Lu; Ming-Hsuan Yang

ターゲットを意識したデータ合成による自己監視追跡

ディープラーニングベースの追跡方法は大幅な進歩を遂げましたが、十分なトレーニングを行うには、大規模で高品質の注釈付きデータが必要です。高価で徹底的な注釈を排除するために、視覚追跡のための自己監視学習を研究しています。この作業では、オブジェクトの外観の変化や背景の干渉など、追跡中のさまざまな外観の変化をシミュレートすることにより、十分なトレーニングデータを合成できるCrop-Transform-Paste操作を開発します。ターゲット状態はすべての合成データでわかっているため、既存のディープトラッカーは、人間の注釈なしで合成データを使用して日常的な方法でトレーニングできます。提案されたターゲット認識データ合成方法は、アルゴリズムを変更することなく、自己監視学習フレームワーク内の既存の追跡アプローチを適応させます。したがって、提案された自己監視学習メカニズムは、トレーニングを実行するために既存の追跡フレームワークにシームレスに統合することができます。広範な実験により、私たちの方法は1）注釈が制限されている場合に、教師あり学習スキームに対して良好なパフォーマンスを達成することが示されています。 2）操作性によるオブジェクトの変形、オクルージョン、背景の乱雑さなど、さまざまな追跡の課題に対処するのに役立ちます。 3）最先端の教師なし追跡方法に対して有利に機能します。 4）SiamRPN ++、DiMP、TransT（Transformerに基づく）など、さまざまな最先端の教師あり学習フレームワークのパフォーマンスを向上させます。

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training. To eliminate expensive and exhaustive annotation, we study self-supervised learning for visual tracking. In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data by simulating various appearance variations during tracking, including appearance variations of objects and background interference. Since the target state is known in all synthesized data, existing deep trackers can be trained in routine ways using the synthesized data without human annotation. The proposed target-aware data-synthesis method adapts existing tracking approaches within a self-supervised learning framework without algorithmic changes. Thus, the proposed self-supervised learning mechanism can be seamlessly integrated into existing tracking frameworks to perform training. Extensive experiments show that our method 1) achieves favorable performance against supervised learning schemes under the cases with limited annotations; 2) helps deal with various tracking challenges such as object deformation, occlusion, or background clutter due to its manipulability; 3) performs favorably against state-of-the-art unsupervised tracking methods; 4) boosts the performance of various state-of-the-art supervised learning frameworks, including SiamRPN++, DiMP, and TransT (based on Transformer).

updated: Thu Dec 02 2021 02:08:10 GMT+0000 (UTC)

published: Mon Jun 21 2021 07:40:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト