Tracking Without Re-recognition in Humans and Machines

Drew Linsley; Girik Malik; Junkyung Kim; Lakshmi N Govindarajan; Ennio Mingolla; Thomas Serre

人間と機械の再認識なしの追跡

数百の群れの中の特定のミバエを追跡しようとしていると想像してみてください。より高度な生物学的視覚システムは、外観と動きの両方の特徴に依存することにより、動く物体を追跡するように進化してきました。視覚追跡用の最先端のディープニューラルネットワークが同じことができるかどうかを調査します。このために、PathTrackerを紹介します。これは、人間の観察者と機械に、同じように見える「気を散らす」オブジェクトの真っ只中にあるターゲットオブジェクトを追跡するように求める合成視覚チャレンジです。人間は楽にPathTrackerを学び、タスク設計の体系的なバリエーションに一般化する一方で、最先端のディープネットワークは苦労しています。この制限に対処するために、モーションキューに基づいてオブジェクトを追跡することに関係している生物学的脳の回路メカニズムを特定してモデル化します。リカレントネットワークとしてインスタンス化されると、回路モデルは、人間のパフォーマンスに匹敵する堅牢な視覚的戦略でPathTrackerを解決することを学習し、課題に関する意思決定のかなりの部分を説明します。また、この回路モデルの成功は、自然なビデオでのオブジェクトトラッキングにも及ぶことを示しています。オブジェクトトラッキング用のトランスフォーマーベースのアーキテクチャに追加すると、オブジェクトの外観に影響を与える視覚的な妨害に対する耐性が構築され、大規模なTrackingNetオブジェクトトラッキングの課題に対する新しい最先端のパフォーマンスが得られます。私たちの仕事は、人間の視覚をよりよく理解し、コンピュータービジョンを改善するのに役立つ人工視覚モデルを構築することの重要性を強調しています。

Imagine trying to track one particular fruitfly in a swarm of hundreds. Higher biological visual systems have evolved to track moving objects by relying on both appearance and motion features. We investigate if state-of-the-art deep neural networks for visual tracking are capable of the same. For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object in the midst of identical-looking "distractor" objects. While humans effortlessly learn PathTracker and generalize to systematic variations in task design, state-of-the-art deep networks struggle. To address this limitation, we identify and model circuit mechanisms in biological brains that are implicated in tracking objects based on motion cues. When instantiated as a recurrent network, our circuit model learns to solve PathTracker with a robust visual strategy that rivals human performance and explains a significant proportion of their decision-making on the challenge. We also show that the success of this circuit model extends to object tracking in natural videos. Adding it to a transformer-based architecture for object tracking builds tolerance to visual nuisances that affect object appearance, resulting in a new state-of-the-art performance on the large-scale TrackingNet object tracking challenge. Our work highlights the importance of building artificial vision models that can help us better understand human vision and improve computer vision.

updated: Thu May 27 2021 17:56:37 GMT+0000 (UTC)

published: Thu May 27 2021 17:56:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト