CNNs and Transformers Perceive Hybrid Images Similar to Humans

Ali Borji

CNNとトランスフォーマーは人間に似たハイブリッド画像を知覚します

ハイブリッド画像は、視距離の関数として変化する2つの解釈を持つ画像を生成する手法です。これは、人間の視覚システムによる画像のマルチスケール処理を研究するために利用されてきました。ここでは、10の果物カテゴリにわたる63,000のハイブリッド画像を使用して、深層学習ビジョンモデルの予測がこれらの画像の人間の知覚と質的に一致することを示します。私たちの結果は、畳み込みニューラルネットワーク（CNN）とトランスフォーマーが視覚野の腹側ストリームにおける情報のフィードフォワードスイープのモデリングに優れているという仮説を支持するさらに別の証拠を提供します。コードとデータはhttps://github.com/aliborji/hybrid_images.gitで入手できます。

Hybrid images is a technique to generate images with two interpretations that change as a function of viewing distance. It has been utilized to study multiscale processing of images by the human visual system. Using 63,000 hybrid images across 10 fruit categories, here we show that predictions of deep learning vision models qualitatively matches with the human perception of these images. Our results provide yet another evidence in support of the hypothesis that Convolutional Neural Networks (CNNs) and Transformers are good at modeling the feedforward sweep of information in the ventral stream of visual cortex. Code and data is available at https://github.com/aliborji/hybrid_images.git.

updated: Sat Mar 19 2022 21:37:07 GMT+0000 (UTC)

published: Sat Mar 19 2022 21:37:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト