Deep Active Shape Model for Face Alignment and Pose Estimation in Mobile Environment

Ali Pourramezan Fard; Hojjat Abdollahi; Mohammad Mahoor

モバイル環境での顔の位置合わせとポーズ推定のためのディープアクティブシェイプモデル

Active Shape Model（ASM）は、ターゲット構造を表すオブジェクト形状の統計モデルです。 ASMは、機械学習アルゴリズムをガイドして、オブジェクト（顔など）を表す一連のポイントを画像に適合させることができます。このホワイトペーパーでは、顔の位置合わせと野生での頭のポーズの推定のためにASMによって支援される損失関数を備えた軽量の畳み込みニューラルネットワーク（CNN）アーキテクチャを紹介します。最初にASMを使用して、顔のランドマークポイントのよりスムーズな分布の学習に向けてネットワークをガイドします。次に、転移学習に触発されたトレーニングプロセス中に、回帰問題を徐々に強化し、ネットワークを元のランドマークポイント分布の学習に導きます。顔のランドマークポイントの検出と顔のポーズの推定を担当する損失関数でマルチタスクを定義します。複数の相関タスクを同時に学習すると、相乗効果が生まれ、個々のタスクのパフォーマンスが向上します。提案されたCNN、ASMNetとMobileNetV2（ASMNetの約2倍）のパフォーマンスを、顔の位置合わせとポーズ推定の両方のタスクで比較します。挑戦的なデータセットでの実験結果は、提案されたASM支援損失関数を使用することにより、ASMNetのパフォーマンスが顔の位置合わせタスクでMobileNetV2と同等であることを示しています。さらに、顔のポーズの推定では、ASMNetはMobileNetV2よりもはるかに優れたパフォーマンスを発揮します。さらに、全体的なASMNetは、多くのCNNベースの提案されたモデルと比較して、パラメーターと浮動小数点演算の数が大幅に少ない一方で、顔のランドマークポイントの検出とポーズの推定に許容できるパフォーマンスを実現します。

Active Shape Model (ASM) is a statistical model of object shapes that represents a target structure. ASM can guide machine learning algorithms to fit a set of points representing an object (e.g., face) onto an image. This paper presents a lightweight Convolutional Neural Network (CNN) architecture with a loss function being assisted by ASM for face alignment and estimating head pose in the wild. We use ASM to first guide the network towards learning the smoother distribution of the facial landmark points. Then, during the training process, inspired by the transfer learning, we gradually harden the regression problem and lead the network towards learning the original landmark points distribution. We define multi-tasks in our loss function that are responsible for detecting facial landmark points, as well as estimating face pose. Learning multiple correlated tasks simultaneously builds synergy and improves the performance of individual tasks. We compare the performance of our proposed CNN, ASMNet with MobileNetV2 (which is about 2 times bigger ASMNet) in both face alignment and pose estimation tasks. Experimental results on challenging datasets show that by using the proposed ASM assisted loss function, ASMNet performance is comparable with MobileNetV2 in face alignment task. Besides, for face pose estimation, ASMNet performs much better than MobileNetV2. Moreover, overall ASMNet achieves an acceptable performance for facial landmark points detection and pose estimation while having a significantly smaller number of parameters and floating-point operations comparing to many CNN-based proposed models.

updated: Thu Mar 11 2021 18:40:12 GMT+0000 (UTC)

published: Sat Feb 27 2021 03:46:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト