Bio-Inspired Representation Learning for Visual Attention Prediction

Yuan Yuan; Hailong Ning; Xiaoqiang Lu

視覚的注意予測のためのバイオインスパイアード表現学習

視覚的注意予測（VAP）は、コンピュータービジョンの分野で重要かつ不可欠な問題です。既存のVAPメソッドのほとんどは、ディープラーニングに基づいています。ただし、視覚的注意マップを生成する際に、低レベルのコントラスト機能を十分に活用していません。この論文では、バイオインスパイアード表現学習を介して視覚的注意マップを生成するための新しいVAP法を提案します。バイオインスパイアード表現学習は、低レベルのコントラストと高レベルのセマンティック機能の両方を同時に組み合わせます。これは、人間の目が高コントラストのパッチと高セマンティックのオブジェクトに敏感であるという事実によって開発されました。提案された方法は、3つの主要なステップで構成されています：1）特徴抽出、2）生物に触発された表現学習、および3）視覚的注意マップの生成。最初に、高レベルのセマンティック特徴が洗練されたVGG16から抽出され、低レベルのコントラスト特徴が、深いネットワークで提案されたコントラスト特徴抽出ブロックによって抽出されます。第二に、バイオインスパイアード表現学習中に、抽出された低レベルのコントラストと高レベルのセマンティック特徴の両方が、さまざまな特徴をスケールごとに連結するために提案された、設計された密に接続されたブロックによって結合されます。最後に、加重融合層を利用して、バイオインスパイアード表現学習後に得られた表現に基づいて、究極の視覚的注意マップを生成します。提案された方法の有効性を実証するために、広範な実験が行われる。

Visual Attention Prediction (VAP) is a significant and imperative issue in the field of computer vision. Most of existing VAP methods are based on deep learning. However, they do not fully take advantage of the low-level contrast features while generating the visual attention map. In this paper, a novel VAP method is proposed to generate visual attention map via bio-inspired representation learning. The bio-inspired representation learning combines both low-level contrast and high-level semantic features simultaneously, which are developed by the fact that human eye is sensitive to the patches with high contrast and objects with high semantics. The proposed method is composed of three main steps: 1) feature extraction, 2) bio-inspired representation learning and 3) visual attention map generation. Firstly, the high-level semantic feature is extracted from the refined VGG16, while the low-level contrast feature is extracted by the proposed contrast feature extraction block in a deep network. Secondly, during bio-inspired representation learning, both the extracted low-level contrast and high-level semantic features are combined by the designed densely connected block, which is proposed to concatenate various features scale by scale. Finally, the weighted-fusion layer is exploited to generate the ultimate visual attention map based on the obtained representations after bio-inspired representation learning. Extensive experiments are performed to demonstrate the effectiveness of the proposed method.

updated: Tue Mar 09 2021 09:15:36 GMT+0000 (UTC)

published: Tue Mar 09 2021 09:15:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト