Noise-Aware Saliency Prediction for Videos with Incomplete Gaze Data

Ekta Prashnani; Orazio Gallo; Joohwan Kim; Josef Spjut; Pradeep Sen; Iuri Frosio

不完全な注視データを含むビデオのノイズを意識した顕著性予測

ディープラーニングベースのアルゴリズムは、視覚的顕著性の予測に印象的な結果をもたらしましたが、視線データのトレーニングにおけるノイズの影響はほとんど見過ごされてきました。この問題は、視線データが不完全である傾向があり、したがって画像と比較してノイズが多いビデオに特に関係があります。したがって、視線データの不完全性と不正確さから生じる不確実性を定量化し、トレーニングでそれを説明する視覚的顕著性予測のためのノイズ認識トレーニング（NAT）パラダイムを提案します。採用されたモデルアーキテクチャ、損失関数、またはトレーニングデータセットとは関係なく、NATの利点を示します。不完全なトレーニングデータセットのノイズに対する堅牢性を考えると、NATは、より少ない人間の被験者で視線データセットを設計する可能性を導きます。また、豊富な時間的セマンティクスとフレームごとの複数の注視アトラクタを備えた、ビデオ顕著性研究のためのビデオゲームコンテキストを提供する最初のデータセットを紹介します。

Deep-learning-based algorithms have led to impressive results in visual-saliency prediction, but the impact of noise in training gaze data has been largely overlooked. This issue is especially relevant for videos, where the gaze data tends to be incomplete, and thus noisier, compared to images. Therefore, we propose a noise-aware training (NAT) paradigm for visual-saliency prediction that quantifies the uncertainty arising from gaze data incompleteness and inaccuracy, and accounts for it in training. We demonstrate the advantage of NAT independently of the adopted model architecture, loss function, or training dataset. Given its robustness to the noise in incomplete training datasets, NAT ushers in the possibility of designing gaze datasets with fewer human subjects. We also introduce the first dataset that offers a video-game context for video-saliency research, with rich temporal semantics, and multiple gaze attractors per frame.

updated: Fri Apr 16 2021 11:32:46 GMT+0000 (UTC)

published: Fri Apr 16 2021 11:32:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト