Self-Supervised Face Presentation Attack Detection with Dynamic Grayscale Snippets

Usman Muhammad; Mourad Oussalah

動的グレースケールスニペットを使用した自己管理型の顔プレゼンテーション攻撃の検出

顔プレゼンテーション攻撃検出 (PAD) は、顔認識システムをプレゼンテーション攻撃から防御する上で重要な役割を果たします。 PAD の成功は、膨大な数のラベル付きデータを必要とする教師あり学習に大きく依存しています。これは、ビデオにとって特に困難であり、多くの場合、専門知識を必要とします。ラベル付けされたデータのコストのかかる収集を回避するために、このペーパーでは、動き予測による自己教師付きビデオ表現学習の新しい方法を紹介します。これを実現するために、ビデオシーケンスの 3 つの異なる時間に取得される 3 つの RGB フレームに基づく時間的一貫性を利用します。取得したフレームはグレースケール画像に変換され、各画像は R (赤)、G (緑)、B (青) などの 3 つの異なるチャネルに指定され、ダイナミックグレースケールスニペット (DGS) を形成します。これに動機付けられて、ラベルは自動的に生成され、ビデオのさまざまな時間の長さを使用して DGS に基づいて時間の多様性を高めます。これは、下流のタスクに非常に役立つことが証明されています。私たちの方法の自己管理型の性質を利用して、4 つの公開ベンチマークデータセット、すなわち Replay-Attack、MSU-MFSD、CASIA-FASD、および OULU-NPU で既存の方法よりも優れた結果を報告します。説明可能性分析は、DGS で使用される最も重要な機能を視覚化するために、LIME および Grad-CAM 技術によって実行されました。

Face presentation attack detection (PAD) plays an important role in defending face recognition systems against presentation attacks. The success of PAD largely relies on supervised learning that requires a huge number of labeled data, which is especially challenging for videos and often requires expert knowledge. To avoid the costly collection of labeled data, this paper presents a novel method for self-supervised video representation learning via motion prediction. To achieve this, we exploit the temporal consistency based on three RGB frames which are acquired at three different times in the video sequence. The obtained frames are then transformed into grayscale images where each image is specified to three different channels such as R(red), G(green), and B(blue) to form a dynamic grayscale snippet (DGS). Motivated by this, the labels are automatically generated to increase the temporal diversity based on DGS by using the different temporal lengths of the videos, which prove to be very helpful for the downstream task. Benefiting from the self-supervised nature of our method, we report the results that outperform existing methods on four public benchmark datasets, namely Replay-Attack, MSU-MFSD, CASIA-FASD, and OULU-NPU. Explainability analysis has been carried out through LIME and Grad-CAM techniques to visualize the most important features used in the DGS.

updated: Sat Aug 27 2022 18:34:13 GMT+0000 (UTC)

published: Sat Aug 27 2022 18:34:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト