Learnable Multi-level Frequency Decomposition and Hierarchical Attention Mechanism for Generalized Face Presentation Attack Detection

Meiling Fang; Naser Damer; Florian Kirchbuchner; Arjan Kuijper

一般化された顔の提示攻撃検出のための学習可能なマルチレベル周波数分解と階層的注意メカニズム

日常生活における顔認識システムの普及に伴い、顔認識攻撃検知（PAD）が注目され、顔認識システムの確保に重要な役割を果たしています。データセット内評価で手作りの深層学習ベースの方法によって達成された優れたパフォーマンスにもかかわらず、目に見えないシナリオを処理するとパフォーマンスが低下します。この作業では、デュアルストリーム畳み込みニューラルネットワーク（CNN）フレームワークを提案します。 1つのストリームは、4つの学習可能な周波数フィルターを適応させて、センサー/照明の変動による影響が少ない周波数領域の機能を学習します。もう1つのストリームは、RGB画像を利用して、周波数領域の機能を補完します。さらに、CNNのさまざまなレイヤーの深い特徴の性質を考慮して、さまざまな段階で2つのストリームからの情報を結合するための階層的注意モジュールの統合を提案します。提案された方法は、データセット内およびデータセット間のセットアップで評価され、結果は、提案されたアプローチが、ドメイン適応のために明示的に設計された方法を含む、最先端のものと比較して、ほとんどの実験セットアップで一般化可能性を高めることを示しています/ shiftの問題。提案された学習可能な周波数分解、階層的注意モジュールの設計、および使用された損失関数を含む段階的なアブレーション研究で、提案されたPADソリューションの設計を成功裏に証明します。トレーニングコードと事前トレーニング済みモデルが公開されています

With the increased deployment of face recognition systems in our daily lives, face presentation attack detection (PAD) is attracting much attention and playing a key role in securing face recognition systems. Despite the great performance achieved by the hand-crafted and deep-learning-based methods in intra-dataset evaluations, the performance drops when dealing with unseen scenarios. In this work, we propose a dual-stream convolution neural networks (CNNs) framework. One stream adapts four learnable frequency filters to learn features in the frequency domain, which are less influenced by variations in sensors/illuminations. The other stream leverages the RGB images to complement the features of the frequency domain. Moreover, we propose a hierarchical attention module integration to join the information from the two streams at different stages by considering the nature of deep features in different layers of the CNN. The proposed method is evaluated in the intra-dataset and cross-dataset setups, and the results demonstrate that our proposed approach enhances the generalizability in most experimental setups in comparison to state-of-the-art, including the methods designed explicitly for domain adaption/shift problems. We successfully prove the design of our proposed PAD solution in a step-wise ablation study that involves our proposed learnable frequency decomposition, our hierarchical attention module design, and the used loss function. Training codes and pre-trained models are publicly released

updated: Tue Nov 02 2021 08:53:17 GMT+0000 (UTC)

published: Thu Sep 16 2021 13:06:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト