A Deep Learning Framework to Reconstruct Face under Mask

Gourango Modak; Shuvra Smaran Das; Md. Ajharul Islam Miraj; Md. Kishor Morol

マスクの下で顔を再構築するためのディープラーニングフレームワーク

ディープラーニングベースの画像再構成法は、画像からオブジェクトを削除するのに大きな成功を収めていますが、性別、民族性、表現、および顔の位相構造などの他の特性に一貫性を持たせるための許容できる結果はまだ得られていません。この作業の目的は、マスクされた画像からマスク領域を抽出し、検出された領域を再構築することです。この問題は複雑です。（i）マスクの後ろに隠された画像の性別を判別することが困難であるため、ネットワークが混乱し、男性の顔が女性として再構築されるか、またはその逆になります。（ii）複数の角度から画像を受信する可能性があり、実際の形状、顔の位相構造、および自然な画像を維持することが非常に困難になります。（iii）場合によっては、マスクの面積を正確に予測できないため、さまざまなマスク形式に問題があります。マスクの特定の部分は、完了後も顔に残ります。この複雑なタスクを解決するために、問題を3つのフェーズに分割します。ランドマークの検出、ターゲットマスク領域のオブジェクト検出、およびアドレス指定されたマスク領域の修復です。まず、最初の問題を解決するために、マスクの背後にある実際の性別を検出する性別分類を使用し、次にマスクされた顔画像のランドマークを検出しました。次に、顔以外のアイテム、つまりマスクを識別し、Mask R-CNNネットワークを使用して、観察されたマスク領域のバイナリマスクを作成しました。第三に、予想されるランドマークを使用してリアルな画像を作成する修復ネットワークを開発しました。マスクをセグメント化するために、この記事ではマスクR-CNNを使用し、マスク領域を識別するためのバイナリセグメンテーションマップを提供します。さらに、GANベースのネットワークを介した構造ガイダンスとしてランドマークを利用して画像を生成しました。このホワイトペーパーで紹介する調査では、FFHQおよびCelebAデータセットを使用しています。

While deep learning-based image reconstruction methods have shown significant success in removing objects from pictures, they have yet to achieve acceptable results for attributing consistency to gender, ethnicity, expression, and other characteristics like the topological structure of the face. The purpose of this work is to extract the mask region from a masked image and rebuild the area that has been detected. This problem is complex because (i) it is difficult to determine the gender of an image hidden behind a mask, which causes the network to become confused and reconstruct the male face as a female or vice versa; (ii) we may receive images from multiple angles, making it extremely difficult to maintain the actual shape, topological structure of the face and a natural image; and (iii) there are problems with various mask forms because, in some cases, the area of the mask cannot be anticipated precisely; certain parts of the mask remain on the face after completion. To solve this complex task, we split the problem into three phases: landmark detection, object detection for the targeted mask area, and inpainting the addressed mask region. To begin, to solve the first problem, we have used gender classification, which detects the actual gender behind a mask, then we detect the landmark of the masked facial image. Second, we identified the non-face item, i.e., the mask, and used the Mask R-CNN network to create the binary mask of the observed mask area. Thirdly, we developed an inpainting network that uses anticipated landmarks to create realistic images. To segment the mask, this article uses a mask R-CNN and offers a binary segmentation map for identifying the mask area. Additionally, we generated the image utilizing landmarks as structural guidance through a GAN-based network. The studies presented in this paper use the FFHQ and CelebA datasets.

updated: Wed Mar 23 2022 15:23:24 GMT+0000 (UTC)

published: Wed Mar 23 2022 15:23:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト