Efficient Scale Estimation Methods using Lightweight Deep Convolutional Neural Networks for Visual Tracking

Seyed Mojtaba Marvasti-Zadeh; Hossein Ghanei-Yakhdan; Shohreh Kasaei

視覚追跡のための軽量の深い畳み込みニューラルネットワークを使用した効率的なスケール推定方法

近年、判別相関フィルター（DCF）に基づく視覚追跡方法は非常に有望です。ただし、これらの方法のほとんどは、堅牢なスケール推定スキルの欠如に悩まされています。最近のさまざまなDCFベースのメソッドは、変換モデルの深い畳み込みニューラルネットワーク（CNN）から抽出された機能を利用していますが、視覚的なターゲットのスケールは、手作りの機能によって推定されます。 CNNの活用は高い計算負荷を課しますが、このホワイトペーパーでは、事前トレーニング済みの軽量CNNモデルを活用して、視覚追跡パフォーマンスを向上させるだけでなく、許容可能な追跡速度を提供する2つの効率的なスケール推定方法を提案します。提案された方法は、畳み込みフィーチャマップの全体的または領域表現に基づいて定式化され、効率的にDCFの定式化に統合して、周波数領域でロバストなスケールモデルを学習します。さらに、提案された方法は、異なるターゲット領域の反復的な特徴抽出による従来のスケール推定方法に対して、計算効率を大幅に改善する提案されたワンパス特徴抽出プロセスを利用します。 OTB-50、OTB-100、TC-128、およびVOT-2018の視覚追跡データセットに関する包括的な実験結果は、提案された視覚追跡方法が最新の方法よりも効果的に優れていることを示しています。

In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising. However, most of these methods suffer from a lack of robust scale estimation skills. Although a wide range of recent DCF-based methods exploit the features that are extracted from deep convolutional neural networks (CNNs) in their translation model, the scale of the visual target is still estimated by hand-crafted features. Whereas the exploitation of CNNs imposes a high computational burden, this paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods, which not only improve the visual tracking performance but also provide acceptable tracking speeds. The proposed methods are formulated based on either holistic or region representation of convolutional feature maps to efficiently integrate into DCF formulations to learn a robust scale model in the frequency domain. Moreover, against the conventional scale estimation methods with iterative feature extraction of different target regions, the proposed methods exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency. Comprehensive experimental results on the OTB-50, OTB-100, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed visual tracking methods outperform the state-of-the-art methods, effectively.

updated: Fri Dec 11 2020 08:22:24 GMT+0000 (UTC)

published: Mon Apr 06 2020 18:49:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト