Towards Metrical Reconstruction of Human Faces

Wojciech Zielonka; Timo Bolkart; Justus Thies

人間の顔の計量的再構成に向けて

顔の再構築と追跡は、AR / VR、人間と機械の相互作用、および医療アプリケーションにおける多数のアプリケーションの構成要素です。これらのアプリケーションのほとんどは、特に再構成された対象が計量コンテキストに置かれる場合（つまり、既知のサイズの参照オブジェクトがある場合）、形状の計量的に正しい予測に依存しています。被写体の距離と寸法を測定するアプリケーション（たとえば、メガネフレームに仮想的にフィットするため）には、メトリックの再構築も必要です。単一の画像から顔を再構成するための最先端の方法は、自己監視方式で大規模な2D画像データセットでトレーニングされます。ただし、透視投影の性質上、実際の顔の寸法を再構築することはできず、平均的な人間の顔を予測することでさえ、計量的な意味でこれらの方法のいくつかよりも優れています。顔の実際の形を学ぶために、教師ありトレーニングスキームを主張します。このタスクには大規模な3Dデータセットが存在しないため、中小規模のデータベースに注釈を付けて統合しました。結果として得られる統合データセットは、依然として2kを超えるIDを持つ中規模のデータセットであり、純粋にそれをトレーニングすると、過剰適合につながる可能性があります。この目的のために、大規模な2D画像データセットで事前トレーニングされた顔認識ネットワークを利用します。このネットワークは、さまざまな顔に異なる機能を提供し、表情、照明、カメラの変更に対して堅牢です。これらの機能を使用して、顔認識ネットワークの堅牢性と一般化を継承し、監視された方法で顔形状推定器をトレーニングします。 MICA（MetrIC fAce）と呼ばれる私たちの方法は、現在の非メトリックベンチマークとメトリックベンチマークの両方で、最先端の再構築方法を大幅に上回っています（平均15％および24％低い）それぞれ、NoWのエラー）。

Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size). A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense. To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset is still a medium-scale dataset with more than 2k identities and training purely on it would lead to overfitting. To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network. Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric benchmarks (15% and 24% lower average error on NoW, respectively).

updated: Wed Oct 19 2022 17:29:53 GMT+0000 (UTC)

published: Wed Apr 13 2022 18:57:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト