Capturing the objects of vision with neural networks

Benjamin Peters; Nikolaus Kriegeskorte

ニューラルネットワークで視覚オブジェクトをキャプチャする

人間の視覚は、物理的な関節でシーンを切り分け、世界をオブジェクトに分解します。オブジェクトは、周囲と関わりながら、選択的に参加、追跡、予測されます。オブジェクト表現は、感覚入力から知覚を解放し、見えないものを念頭に置き、知覚コンテンツを行動と象徴的認知の基礎として使用できるようにします。人間の行動研究は、グループ化、アモーダル補完、プロトオブジェクト、およびオブジェクトファイルを通じてオブジェクト表現がどのように出現するかを文書化しています。対照的に、視覚オブジェクト認識のディープニューラルネットワーク（DNN）モデルは、オブジェクトのラベル付けで人間レベルのパフォーマンスを達成しているにもかかわらず、主に感覚入力に拘束されたままです。ここでは、両方の分野で関連する作業を確認し、これらの分野が互いにどのように役立つかを調べます。認知文献は、人間の物体知覚のメカニズムを明らかにし、物体を物体認識に入れるディープニューラルネットワークモデルの開発を推進するベンチマークとして機能する新しい実験タスクの開発の出発点を提供します。

Human visual perception carves a scene at its physical joints, decomposing the world into objects, which are selectively attended, tracked, and predicted as we engage our surroundings. Object representations emancipate perception from the sensory input, enabling us to keep in mind that which is out of sight and to use perceptual content as a basis for action and symbolic cognition. Human behavioral studies have documented how object representations emerge through grouping, amodal completion, proto-objects, and object files. Deep neural network (DNN) models of visual object recognition, by contrast, remain largely tethered to the sensory input, despite achieving human-level performance at labeling objects. Here, we review related work in both fields and examine how these fields can help each other. The cognitive literature provides a starting point for the development of new experimental tasks that reveal mechanisms of human object perception and serve as benchmarks driving development of deep neural network models that will put the object into object recognition.

updated: Tue Sep 07 2021 21:49:53 GMT+0000 (UTC)

published: Tue Sep 07 2021 21:49:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト