ASMNet: a Lightweight Deep Neural Network for Face Alignment and Pose Estimation

Ali Pourramezan Fard; Hojjat Abdollahi; Mohammad Mahoor

ASMNet：顔の位置合わせとポーズ推定のための軽量ディープニューラルネットワーク

Active Shape Model（ASM）は、ターゲット構造を表すオブジェクト形状の統計モデルです。 ASMは、機械学習アルゴリズムをガイドして、オブジェクト（顔など）を表す一連のポイントを画像に適合させることができます。このホワイトペーパーでは、顔の位置合わせと野生での頭のポーズの推定のためにASMによって支援される損失関数を備えた軽量の畳み込みニューラルネットワーク（CNN）アーキテクチャを紹介します。 ASMを使用して、最初にネットワークをガイドし、顔のランドマークポイントのよりスムーズな分布を学習します。転移学習に触発されて、トレーニングプロセス中に、回帰問題を徐々に強化し、元のランドマークポイント分布の学習に向けてネットワークをガイドします。顔のランドマークポイントの検出と顔のポーズの推定を担当する損失関数でマルチタスクを定義します。複数の相関タスクを同時に学習すると、相乗効果が生まれ、個々のタスクのパフォーマンスが向上します。顔の位置合わせとポーズ推定の両方のタスクで、ASMNetと呼ばれる提案モデルのパフォーマンスをMobileNetV2（ASMNetの約2倍）と比較します。挑戦的なデータセットでの実験結果は、提案されたASM支援損失関数を使用することにより、ASMNetのパフォーマンスが顔の位置合わせタスクでMobileNetV2と同等であることを示しています。さらに、顔のポーズの推定では、ASMNetはMobileNetV2よりもはるかに優れたパフォーマンスを発揮します。 ASMNetは、多くのCNNベースのモデルと比較して、パラメーターと浮動小数点演算の数を大幅に減らしながら、顔のランドマークポイントの検出とポーズの推定に許容できるパフォーマンスを実現します。

Active Shape Model (ASM) is a statistical model of object shapes that represents a target structure. ASM can guide machine learning algorithms to fit a set of points representing an object (e.g., face) onto an image. This paper presents a lightweight Convolutional Neural Network (CNN) architecture with a loss function being assisted by ASM for face alignment and estimating head pose in the wild. We use ASM to first guide the network towards learning a smoother distribution of the facial landmark points. Inspired by transfer learning, during the training process, we gradually harden the regression problem and guide the network towards learning the original landmark points distribution. We define multi-tasks in our loss function that are responsible for detecting facial landmark points as well as estimating the face pose. Learning multiple correlated tasks simultaneously builds synergy and improves the performance of individual tasks. We compare the performance of our proposed model called ASMNet with MobileNetV2 (which is about 2 times bigger than ASMNet) in both the face alignment and pose estimation tasks. Experimental results on challenging datasets show that by using the proposed ASM assisted loss function, the ASMNet performance is comparable with MobileNetV2 in the face alignment task. In addition, for face pose estimation, ASMNet performs much better than MobileNetV2. ASMNet achieves an acceptable performance for facial landmark points detection and pose estimation while having a significantly smaller number of parameters and floating-point operations compared to many CNN-based models.

updated: Fri May 07 2021 17:44:58 GMT+0000 (UTC)

published: Sat Feb 27 2021 03:46:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト