Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks for Robust Visual Tracking

Seyed Mojtaba Marvasti-Zadeh; Hossein Ghanei-Yakhdan; Shohreh Kasaei

ロバストな視覚追跡のための事前トレーニング済みの深い畳み込みニューラルネットワークの適応的活用

多層非線形変換による自動特徴抽出手順により、ディープラーニングベースのビジュアルトラッカーは、最近、ビジュアルトラッキングを目的とする困難なシナリオで大きな成功を収めています。これらのトラッカーの多くは事前トレーニング済みの畳み込みニューラルネットワーク（CNN）からの機能マップを利用しますが、異なるモデルを選択し、それらの機能マップのさまざまな組み合わせを利用する効果は、まだ完全には比較されていません。私たちの知る限りでは、これらの方法はすべて、追跡中に発生する可能性のあるシーン属性（オクルージョン、デフォーメーション、高速モーションなど）を考慮せずに、固定数の畳み込みフィーチャマップを使用します。前提条件として、このホワイトペーパーでは、さまざまなトポロジのCNNモデルを活用できる方法に基づいて、適応判別相関フィルター（DCF）を提案します。まず、このペーパーでは、一般的に使用される4つのCNNモデルを包括的に分析して、各モデルの最適な機能マップを決定します。第2に、属性辞書としての分析結果を利用して、ビデオの特性に関するビジュアルトラッカーの精度と堅牢性を向上させるために、深い機能の適応的な活用が提案されています。第3に、提案された方法の一般化は、さまざまな追跡データセットと同様のアーキテクチャを持つCNNモデルで検証されます。最後に、広範な実験結果は、提案された適応手法の有効性を、最新の視覚追跡手法と比較して示しています。

Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved great success in challenging scenarios for visual tracking purposes. Although many of those trackers utilize the feature maps from pre-trained convolutional neural networks (CNNs), the effects of selecting different models and exploiting various combinations of their feature maps are still not compared completely. To the best of our knowledge, all those methods use a fixed number of convolutional feature maps without considering the scene attributes (e.g., occlusion, deformation, and fast motion) that might occur during tracking. As a pre-requisition, this paper proposes adaptive discriminative correlation filters (DCF) based on the methods that can exploit CNN models with different topologies. First, the paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model. Second, with the aid of analysis results as attribute dictionaries, adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers regarding video characteristics. Third, the generalization of the proposed method is validated on various tracking datasets as well as CNN models with similar architectures. Finally, extensive experimental results demonstrate the effectiveness of the proposed adaptive method compared with state-of-the-art visual tracking methods.

updated: Tue Dec 22 2020 06:57:23 GMT+0000 (UTC)

published: Sat Aug 29 2020 17:09:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト