MOS: A Low Latency and Lightweight Framework for Face Detection, Landmark Localization, and Head Pose Estimation

Yepeng Liu; Zaiwang Gu; Shenghua Gao; Dong Wang; Yusheng Zeng; Jun Cheng

MOS：顔検出、ランドマークのローカリゼーション、および頭のポーズの推定のための低遅延で軽量のフレームワーク

サービスロボットや監視カメラの登場により、近年、野生の動的顔認識（DFR）が注目されています。顔検出と頭のポーズの推定は、DFRの2つの重要なステップです。多くの場合、ポーズは顔検出後に推定されます。ただし、このような順次計算では、待ち時間が長くなります。この論文では、顔検出、ランドマークの位置特定、頭の姿勢の推定を同時に行うための、低遅延で軽量なネットワークを提案します。大きな角度の顔の顔のランドマークを見つけることはより困難であるという観察に触発されて、ポーズの喪失が学習を制約するために提案されています。さらに、個々のタスクの重みを自動的に学習するために、不確実性のマルチタスク損失も提案します。もう1つの課題は、ロボットがARMベースのコンピューティングコアのような低い計算ユニットを使用することが多く、重いネットワークではなく軽量のネットワークを使用する必要があることです。これにより、特に小さくて硬い面でパフォーマンスが低下します。この論文では、トレーニングデータの多様性を自動的に高める、さまざまなスケールにわたるトレーニングサンプルを増強するためのオンラインフィードバックサンプリングを提案します。一般的に使用されるWIDERFACE、AFLW、およびAFLW2000データセットでの検証を通じて、結果は、提案された方法が低い計算リソースで最先端のパフォーマンスを達成することを示しています。

With the emergence of service robots and surveillance cameras, dynamic face recognition (DFR) in wild has received much attention in recent years. Face detection and head pose estimation are two important steps for DFR. Very often, the pose is estimated after the face detection. However, such sequential computations lead to higher latency. In this paper, we propose a low latency and lightweight network for simultaneous face detection, landmark localization and head pose estimation. Inspired by the observation that it is more challenging to locate the facial landmarks for faces with large angles, a pose loss is proposed to constrain the learning. Moreover, we also propose an uncertainty multi-task loss to learn the weights of individual tasks automatically. Another challenge is that robots often use low computational units like ARM based computing core and we often need to use lightweight networks instead of the heavy ones, which lead to performance drop especially for small and hard faces. In this paper, we propose online feedback sampling to augment the training samples across different scales, which increases the diversity of training data automatically. Through validation in commonly used WIDER FACE, AFLW and AFLW2000 datasets, the results show that the proposed method achieves the state-of-the-art performance in low computational resources.

updated: Thu Oct 21 2021 08:05:53 GMT+0000 (UTC)

published: Thu Oct 21 2021 08:05:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト