From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

Weiguang Zhao; Chaolong Yang; Jianan Ye; Yuyao Yan; Xi Yang; Kaizhu Huang

2D画像から3Dモデルへ：ディープフュージョンによる弱く監視されたマルチビュー顔再構成

限られた数の2D顔画像（3など）を活用して非常に軽い注釈付きの高品質の3D顔モデルを生成する、弱教師あり学習によるマルチビュー3D顔再構成（MVR）の問題を検討します。それらの有望なパフォーマンスにもかかわらず、現在のMVRメソッドは、単にマルチビュー画像機能を連結し、重要な領域（たとえば、目、眉、鼻、口）にあまり注意を払いません。この目的のために、Deep Fusion MVR（DF-MVR）と呼ばれる新しいモデルを提案し、スキップ接続を備えた単一のデコードフレームワークへのマルチビューエンコーディングを設計し、マルチビューから注意を払って深い特徴を抽出、統合、および補正することができます画像。さらに、重要な共通の顔領域を学習、識別、強調するために、マルチビューの顔解析ネットワークを開発します。最後に、私たちのモデルはいくつかの2D画像でトレーニングされていますが、1つの2D画像が入力された場合でも、正確な3Dモデルを再構築できます。さまざまなマルチビュー3D顔再構成法を評価するために、広範な実験を実施しています。 Pixel-FaceおよびBosphorusデータセットでの実験は、モデルの優位性を示しています。 3Dランドマークの注釈がない場合、DF-MVRは、Pixel-FaceデータセットとBosphorusデータセットで、既存の最も弱く監視されたMVRに対してそれぞれ5.2％と3.0％のRMSEの改善を達成します。 3Dランドマーク注釈を使用すると、DF-MVRは、特にPixel-Faceデータセットで優れたパフォーマンスを実現し、弱く監視された最良のMVRモデルよりも13.4％RMSE向上します。

We consider the problem of Multi-view 3D Face Reconstruction (MVR) with weakly supervised learning that leverages a limited number of 2D face images (e.g. 3) to generate a high-quality 3D face model with very light annotation. Despite their encouraging performance, present MVR methods simply concatenate multi-view image features and pay less attention to critical areas (e.g. eye, brow, nose, and mouth). To this end, we propose a novel model called Deep Fusion MVR (DF-MVR) and design a multi-view encoding to a single decoding framework with skip connections, able to extract, integrate, and compensate deep features with attention from multi-view images. In addition, we develop a multi-view face parse network to learn, identify, and emphasize the critical common face area. Finally, though our model is trained with a few 2D images, it can reconstruct an accurate 3D model even if one single 2D image is input. We conduct extensive experiments to evaluate various multi-view 3D face reconstruction methods. Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of our model. Without 3D landmarks annotation, DF-MVR achieves 5.2% and 3.0% RMSE improvements over the existing best weakly supervised MVRs respectively on Pixel-Face and Bosphorus datasets; with 3D landmarks annotation, DF-MVR attains superior performance particularly on Pixel-Face dataset, leading to 13.4% RMSE improvement over the best weakly supervised MVR model.

updated: Wed Jul 06 2022 11:55:06 GMT+0000 (UTC)

published: Fri Apr 08 2022 05:11:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト