Single Object Tracking through a Fast and Effective Single-Multiple Model Convolutional Neural Network

Faraz Lotfi; Hamid D. Taghirad

高速で効果的な単一複数モデル畳み込みニューラルネットワークによる単一オブジェクトの追跡

オブジェクトの追跡は、特に同様のオブジェクトが同じ領域に存在する場合に重要になります。最近の最先端（SOTA）アプローチは、重い構造のマッチングネットワークを使用して、ターゲットをエリア内の他のオブジェクトと区別することに基づいて提案されています。これにより、速度の点でトラッカーのパフォーマンスが大幅に低下します。さらに、いくつかの候補が考慮され、処理されて、時間のかかる各フレームの関心領域に目的のオブジェクトがローカライズされます。この記事では、これまでのアプローチとは対照的に、テンプレートを考慮しながらオブジェクトの場所を1回のショットで識別して、同じ領域内の同様のオブジェクトと区別できる特別なアーキテクチャを提案します。簡単に言えば、まず最初に、ターゲットサイズの2倍のオブジェクトを含むウィンドウが考慮されます。次に、このウィンドウは完全畳み込みニューラルネットワーク（CNN）に入力され、各フレームの行列の形式で関心領域（RoI）が抽出されます。最初は、ターゲットのテンプレートもCNNへの入力として使用されます。このRoIマトリックスを考慮して、トラッカーの次の動きは、単純で高速な方法に基づいて決定されます。さらに、このマトリックスは、時間の経過とともに変化するときに重要なオブジェクトサイズを推定するのに役立ちます。一致するネットワークがないにもかかわらず、提示されたトラッカーは、困難な状況でSOTAと比較してパフォーマンスが高く、それらと比較して超高速です（1080tiで最大120 FPS）。この主張を調査するために、GOT-10kデータセットで比較研究が行われます。結果は、タスクを遂行する上で提案された方法の卓越したパフォーマンスを明らかにします。

Object tracking becomes critical especially when similar objects are present in the same area. Recent state-of-the-art (SOTA) approaches are proposed based on taking a matching network with a heavy structure to distinguish the target from other objects in the area which indeed drastically downgrades the performance of the tracker in terms of speed. Besides, several candidates are considered and processed to localize the intended object in a region of interest for each frame which is time-consuming. In this article, a special architecture is proposed based on which in contrast to the previous approaches, it is possible to identify the object location in a single shot while taking its template into account to distinguish it from the similar objects in the same area. In brief, first of all, a window containing the object with twice the target size is considered. This window is then fed into a fully convolutional neural network (CNN) to extract a region of interest (RoI) in a form of a matrix for each of the frames. In the beginning, a template of the target is also taken as the input to the CNN. Considering this RoI matrix, the next movement of the tracker is determined based on a simple and fast method. Moreover, this matrix helps to estimate the object size which is crucial when it changes over time. Despite the absence of a matching network, the presented tracker performs comparatively with the SOTA in challenging situations while having a super speed compared to them (up to 120 FPS on 1080ti). To investigate this claim, a comparison study is carried out on the GOT-10k dataset. Results reveal the outstanding performance of the proposed method in fulfilling the task.

updated: Sun Mar 28 2021 11:02:14 GMT+0000 (UTC)

published: Sun Mar 28 2021 11:02:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト