What am I Searching for: Zero-shot Target Identity Inference in Visual Search

Mengmi Zhang; Gabriel Kreiman

検索対象：ビジュアル検索でのゼロショットターゲットアイデンティティの推論

人の行動から意図を推測できますか？問題の例として、ここでは、目の動きの動作をデコードすることによって、人が探しているものを解読する方法を検討します。 2つの心理物理学実験を実施し、被験者がターゲットオブジェクトを検索しているときに目の動きを監視しました。非ターゲットオブジェクトに当てはまる固定を「エラー固定」と定義しました。これらのエラー修正を使用して、ターゲットが何であるかを推測するモデル（InferNet）を開発しました。 InferNetは、事前トレーニング済みの畳み込みニューラルネットワークを使用してエラー固定から特徴を抽出し、エラー固定と検索画像全体のすべての場所の間の類似性マップを計算します。モデルは、レイヤー全体の類似性マップを統合し、すべてのエラー修正全体でこれらのマップを統合します。 InferNetは、推論タスクに関するオブジェクト固有のトレーニングがなくても、サブジェクトの目標を識別し、競合するnullモデルよりも優れています。

Can we infer intentions from a person's actions? As an example problem, here we consider how to decipher what a person is searching for by decoding their eye movement behavior. We conducted two psychophysics experiments where we monitored eye movements while subjects searched for a target object. We defined the fixations falling on non-target objects as "error fixations". Using those error fixations, we developed a model (InferNet) to infer what the target was. InferNet uses a pre-trained convolutional neural network to extract features from the error fixations and computes a similarity map between the error fixations and all locations across the search image. The model consolidates the similarity maps across layers and integrates these maps across all error fixations. InferNet successfully identifies the subject's goal and outperforms competitive null models, even without any object-specific training on the inference task.

updated: Tue Jun 02 2020 01:17:22 GMT+0000 (UTC)

published: Tue Jul 31 2018 17:15:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト