Deriving Explanation of Deep Visual Saliency Models

Sai Phani Kumar Malladi; Jayanta Mukhopadhyay; Chaker Larabi; Santanu Chaudhury

深い視覚的顕著性モデルの説明の導出

深い神経ネットワークは、視覚的顕著性予測において人間レベルのパフォーマンスを達成する上で大きな影響を示しています。しかし、彼らがどのように課題を学び、それが人間の視覚系を理解する上で何を意味するのかはまだ不明です。この作業では、人間の知覚理論と従来の顕著性の概念を適用することにより、対応する深い神経構造に基づく顕著性モデルから説明可能な顕著性モデルを導出する手法を開発します。この手法は、アクティベーションマップを通じて、中間層でのディープネットワークの学習パターンを理解するのに役立ちます。最初に、2つの最先端の深い顕著性モデル、つまりUNISALとMSI-Netを解釈のために検討します。説明可能な顕著性モデルを使用して、それらの活性化マップを識別および再構築するために、生物学的にもっともらしいログガボールフィルターのセットを使用します。最終的な顕著性マップは、これらの再構築されたアクティベーションマップを使用して生成されます。また、顕著性予測のために、相互連結されたマルチスケール残余ブロックベースのネットワーク（CMRNet）という名前の独自の深い顕著性モデルを構築します。次に、3つのベンチマークデータセットでUNISAL、MSI-Net、およびCMRNetから導出された説明可能なモデルのパフォーマンスを評価し、他の最先端の方法と比較します。したがって、説明可能性のこのアプローチは、それを一般的なものにする解釈のためのあらゆる深い視覚的顕著性モデルに適用できることを提案します。

Deep neural networks have shown their profound impact on achieving human level performance in visual saliency prediction. However, it is still unclear how they learn the task and what it means in terms of understanding human visual system. In this work, we develop a technique to derive explainable saliency models from their corresponding deep neural architecture based saliency models by applying human perception theories and the conventional concepts of saliency. This technique helps us understand the learning pattern of the deep network at its intermediate layers through their activation maps. Initially, we consider two state-of-the-art deep saliency models, namely UNISAL and MSI-Net for our interpretation. We use a set of biologically plausible log-gabor filters for identifying and reconstructing the activation maps of them using our explainable saliency model. The final saliency map is generated using these reconstructed activation maps. We also build our own deep saliency model named cross-concatenated multi-scale residual block based network (CMRNet) for saliency prediction. Then, we evaluate and compare the performance of the explainable models derived from UNISAL, MSI-Net and CMRNet on three benchmark datasets with other state-of-the-art methods. Hence, we propose that this approach of explainability can be applied to any deep visual saliency model for interpretation which makes it a generic one.

updated: Wed Sep 08 2021 12:22:32 GMT+0000 (UTC)

published: Wed Sep 08 2021 12:22:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト