High-Accuracy RGB-D Face Recognition via Segmentation-Aware Face Depth Estimation and Mask-Guided Attention Network

Meng-Tzu Chiu; Hsun-Ying Cheng; Chien-Yi Wang; Shang-Hong Lai

セグメンテーション対応の顔深度推定とマスクガイド付き注意ネットワークによる高精度RGB-D顔認識

ディープラーニングアプローチは、非常に大きな顔画像データセットを使用してモデルをトレーニングすることにより、非常に正確な顔認識を実現しました。大規模な2D顔画像データセットの可用性とは異なり、一般に利用可能な大規模な3D顔データセットはありません。既存の公開3D顔データセットは通常、少数の主題で収集されたため、過剰適合の問題が発生しました。この論文では、RGB-D顔認識タスクを改善するために2つのCNNモデルを提案します。 1つ目は、DepthNetと呼ばれるセグメンテーション対応の深度推定ネットワークです。これは、より正確な顔領域のローカリゼーションのためにセマンティックセグメンテーション情報を含めることにより、RGB顔画像から深度マップを推定します。もう1つは、RGB認識ブランチ、深度マップ認識ブランチ、および空間アテンションモジュールを備えた補助セグメンテーションマスクブランチを含む、新しいマスクガイド付きRGB-D顔認識モデルです。 DepthNetは、大きな2D顔画像データセットを大きなRGB-D顔データセットに拡張するために使用されます。これは、正確なRGB-D顔認識モデルのトレーニングに使用されます。さらに、提案されたマスクガイド付きRGB-D顔認識モデルは、深度マップとセグメンテーションマスク情報を十分に活用でき、以前の方法よりもポーズの変化に対してより堅牢です。私たちの実験結果は、DepthNetがセグメンテーションマスクを使用して顔画像からより信頼性の高い深度マップを生成できることを示しています。私たちのマスクガイド付き顔認識モデルは、いくつかの公開3D顔データセットで最先端の方法よりも優れています。

Deep learning approaches have achieved highly accurate face recognition by training the models with very large face image datasets. Unlike the availability of large 2D face image datasets, there is a lack of large 3D face datasets available to the public. Existing public 3D face datasets were usually collected with few subjects, leading to the over-fitting problem. This paper proposes two CNN models to improve the RGB-D face recognition task. The first is a segmentation-aware depth estimation network, called DepthNet, which estimates depth maps from RGB face images by including semantic segmentation information for more accurate face region localization. The other is a novel mask-guided RGB-D face recognition model that contains an RGB recognition branch, a depth map recognition branch, and an auxiliary segmentation mask branch with a spatial attention module. Our DepthNet is used to augment a large 2D face image dataset to a large RGB-D face dataset, which is used for training an accurate RGB-D face recognition model. Furthermore, the proposed mask-guided RGB-D face recognition model can fully exploit the depth map and segmentation mask information and is more robust against pose variation than previous methods. Our experimental results show that DepthNet can produce more reliable depth maps from face images with the segmentation mask. Our mask-guided face recognition model outperforms state-of-the-art methods on several public 3D face datasets.

updated: Wed Dec 22 2021 07:46:23 GMT+0000 (UTC)

published: Wed Dec 22 2021 07:46:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト