An Psychophysical Oriented Saliency Map Prediction Model

Qiang Li

精神物理学指向の顕著性マップ予測モデル

視覚的注意は、外部の冗長性の世界を選択して理解するための最も重要な特性の1つです。複雑なシーンの性質には、膨大な冗長性が含まれます。人間の視覚システムは、視覚情報のボトルネックのため、すべての情報を同時に処理することはできません。人間の視覚システムは、入力された視覚的冗長性情報を減らすために、主にシーンの支配的な部分に焦点を合わせています。これは一般に、視覚的注意予測または視覚的顕著性マップとして知られています。この論文は、人間の低レベルの視覚野機能に触発された、新しい精神物理学的顕著性予測アーキテクチャ、WECSFを提案します。このモデルは、対戦相手のカラーチャネル、ウェーブレット変換、ウェーブレットエネルギーマップ、および低レベルの画像の特徴と人間の視覚系への最大近似を抽出するためのコントラスト感度関数で構成されています。提案されたモデルは、MIT1003、MIT300、TORONTO、SID4VAM、およびUCF Sportsデータセットを含むいくつかのデータセットを評価して、その効率を説明します。また、顕著性予測のパフォーマンスを他の最先端モデルと定量的および定性的に比較しました。私たちのモデルは非常に安定した良好なパフォーマンスを達成しました。次に、フーリエおよびスペクトルに触発された顕著性予測モデルが、他の最先端の非ニューラルネットワークや精神物理学的合成画像のディープニューラルネットワークモデルと比較して、優れたパフォーマンスを達成したことも確認しました。最後に、提案されたモデルは、時空間顕著性予測にも適用でき、パフォーマンスが向上します。

Visual attention is one of the most significant characteristics for selecting and understanding the outside redundancy world. The nature of complex scenes includes enormous redundancy. The human vision system can not process all information simultaneously because of visual information bottleneck. The human visual system mainly focuses on dominant parts of the scenes to reduce the input visual redundancy information. It is commonly known as visual attention prediction or visual saliency map. This paper proposes a new psychophysical saliency prediction architecture, WECSF, inspired by human low-level visual cortex function. The model consists of opponent color channels, wavelet transform, wavelet energy map, and contrast sensitivity function for extracting low-level image features and maximum approximation to the human visual system. The proposed model is evaluated several datasets, including MIT1003, MIT300, TORONTO, SID4VAM and UCF Sports dataset to explain its efficiency. We also quantitatively and qualitatively compared the performance of saliency prediction with other state-of-the-art models. Our model achieved very stable and good performance. Second, we also confirmed that Fourier and spectral-inspired saliency prediction models achieved outperformance compared to other start-of-the-art non-neural networks and even deep neural network models on psychophysical synthesis images. Finally, the proposed model also can be applied to spatial-temporal saliency prediction and got better performance.

updated: Thu May 20 2021 20:17:41 GMT+0000 (UTC)

published: Sun Nov 08 2020 20:58:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト