A Simple and efficient deep Scanpath Prediction

Mohamed Amine Kerkouri; Aladine Chetouani

シンプルで効率的なディープスキャンパス予測

視覚スキャンパスは、人間の視線が画像を観察しながら移動する一連の注視点であり、その予測は、画像の視覚的注意をモデル化するのに役立ちます。この目的のために、複雑な深層学習アーキテクチャとフレームワークを使用して、いくつかのモデルが文献で提案されました。ここでは、単純な完全畳み込み回帰方式で、一般的な深層学習アーキテクチャを使用する効率を探ります。これらのモデルが2つのデータセットのスキャンパスをどれだけうまく予測できるかを実験します。さまざまな指標を使用して他のモデルと比較し、以前の複雑なアーキテクチャを超えることがある競争力のある結果を示しています。また、実験でのパフォーマンスに基づいて、さまざまな活用されたバックボーンアーキテクチャを比較し、タスクに最も適したアーキテクチャを推測します。

Visual scanpath is the sequence of fixation points that the human gaze travels while observing an image, and its prediction helps in modeling the visual attention of an image. To this end several models were proposed in the literature using complex deep learning architectures and frameworks. Here, we explore the efficiency of using common deep learning architectures, in a simple fully convolutional regressive manner. We experiment how well these models can predict the scanpaths on 2 datasets. We compare with other models using different metrics and show competitive results that sometimes surpass previous complex architectures. We also compare the different leveraged backbone architectures based on their performances on the experiment to deduce which ones are the most suitable for the task.

updated: Wed Dec 08 2021 22:43:45 GMT+0000 (UTC)

published: Wed Dec 08 2021 22:43:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト