Visual Search Asymmetry: Deep Nets and Humans Share Similar Inherent Biases

Shashi Kant Gupta; Mengmi Zhang; Chia-Chien Wu; Jeremy M. Wolfe; Gabriel Kreiman

視覚探索の非対称性: ディープネットと人間は類似した固有のバイアスを共有している

視覚探索は、家で車のキーを探す場合や、人混みの中の友人を探す場合など、どこにでもある困難な日常業務です。いくつかの古典的な検索タスクの興味深い特性は非対称性であり、ディストラクター B の中からターゲット A を見つける方が、A の中から B を見つけるよりも簡単になる可能性があります。検索画像を入力として入力し、ターゲットが見つかるまで一連の目の動きを生成します。このモデルは、離心率に依存する視覚認識をターゲットに依存するトップダウンキューと統合します。人間の非対称性を示す 6 つのパラダイム検索タスクで、モデルを人間の行動と比較しました。刺激やタスク固有のトレーニングに事前にさらすことなく、モデルは検索の非対称性のもっともらしいメカニズムを提供します。検索の非対称性の極性は、自然環境での経験から生じるという仮説を立てました。自然画像のバイアスが削除または反転された ImageNet の拡張バージョンでモデルをトレーニングすることにより、この仮説をテストしました。検索の非対称性の極性は、トレーニングプロトコルに応じて消失または変更されました。この研究は、タスク固有のトレーニングを必要とせずに、ニューラルネットワークモデルで古典的な知覚特性がどのように出現するかを強調していますが、モデルに与えられた発達上の食事の統計的特性の結果です。すべてのソースコードと刺激は公開されています https://github.com/kreimanlab/VisualSearchAsymmetry

Visual search is a ubiquitous and often challenging daily task, exemplified by looking for the car keys at home or a friend in a crowd. An intriguing property of some classical search tasks is an asymmetry such that finding a target A among distractors B can be easier than finding B among A. To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found. The model integrates eccentricity-dependent visual recognition with target-dependent top-down cues. We compared the model against human behavior in six paradigmatic search tasks that show asymmetry in humans. Without prior exposure to the stimuli or task-specific training, the model provides a plausible mechanism for search asymmetry. We hypothesized that the polarity of search asymmetry arises from experience with the natural environment. We tested this hypothesis by training the model on an augmented version of ImageNet where the biases of natural images were either removed or reversed. The polarity of search asymmetry disappeared or was altered depending on the training protocol. This study highlights how classical perceptual properties can emerge in neural network models, without the need for task-specific training, but rather as a consequence of the statistical properties of the developmental diet fed to the model. All source code and stimuli are publicly available https://github.com/kreimanlab/VisualSearchAsymmetry

updated: Sat Jun 05 2021 19:46:42 GMT+0000 (UTC)

published: Sat Jun 05 2021 19:46:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト