Toward Super-Resolution for Appearance-Based Gaze Estimation

Galen O'Shea; Majid Komeili

外観ベースの視線推定のための超解像に向けて

視線追跡は、医学、心理学、仮想現実、マーケティング、安全など、さまざまな分野で幅広い用途を持つ貴重なツールです。したがって、費用対効果が高く高性能な視線追跡ソフトウェアが不可欠です。視線を正確に予測することは依然として困難な作業であり、特に画像がモーションブラー、ビデオ圧縮、およびノイズの影響を受ける現実世界の状況ではなおさらです。超解像は、視覚的な観点から画質を向上させることが示されています。この作業では、外観ベースの視線追跡を改善するための超解像の有用性を調べます。すべての SR モデルが視線方向を保持するわけではないことを示します。 SwinIR 超解像モデルに基づく 2 段階のフレームワークを提案します。提案された方法は、特に低解像度または劣化した画像を含むシナリオで、一貫して最新技術よりも優れています。さらに、視線予測のための自己教師あり学習のレンズを通して超解像の使用を調べます。自己教師あり学習は、ラベル付けされていないデータから学習して、下流のタスクに必要なラベル付けされたデータの量を減らすことを目的としています。 SR バックボーンネットワークを ResNet18 (一部のスキップ接続あり) に融合することにより、SuperVision と呼ばれる新しいアーキテクチャを提案します。提案された SuperVision メソッドは、5 分の 1 のラベル付きデータを使用しますが、100% のトレーニングデータを使用する GazeTR の最先端のメソッドよりも 15% 優れています。

Gaze tracking is a valuable tool with a broad range of applications in various fields, including medicine, psychology, virtual reality, marketing, and safety. Therefore, it is essential to have gaze tracking software that is cost-efficient and high-performing. Accurately predicting gaze remains a difficult task, particularly in real-world situations where images are affected by motion blur, video compression, and noise. Super-resolution has been shown to improve image quality from a visual perspective. This work examines the usefulness of super-resolution for improving appearance-based gaze tracking. We show that not all SR models preserve the gaze direction. We propose a two-step framework based on SwinIR super-resolution model. The proposed method consistently outperforms the state-of-the-art, particularly in scenarios involving low-resolution or degraded images. Furthermore, we examine the use of super-resolution through the lens of self-supervised learning for gaze prediction. Self-supervised learning aims to learn from unlabelled data to reduce the amount of required labeled data for downstream tasks. We propose a novel architecture called SuperVision by fusing an SR backbone network to a ResNet18 (with some skip connections). The proposed SuperVision method uses 5x less labeled data and yet outperforms, by 15%, the state-of-the-art method of GazeTR which uses 100% of training data.

updated: Fri Mar 17 2023 17:40:32 GMT+0000 (UTC)

published: Fri Mar 17 2023 17:40:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト