Alpha Matte Generation from Single Input for Portrait Matting

Dogucan Yaman; Hazım Kemal Ekenel; Alexander Waibel

ポートレートマットの単一入力からのアルファマット生成

ポートレートマッティングは、ビデオ会議アプリ、画像/ビデオ編集、ポストプロダクションなどの幅広いアプリケーションで重要な研究課題です。目標は、前景の被写体に対する各ピクセルの効果を特定するアルファマットを予測することです。従来のアプローチと既存の作品のほとんどは、アルファマットを予測するために、トライマップ、背景画像などの追加入力を利用していました。ただし、追加の入力を提供することが常に実用的であるとは限りません。さらに、モデルはこれらの追加の入力に敏感すぎます。このペーパーでは、Generative Adversarial Nets (GAN) を使用してポートレートマッティングを実行するための追加の入力不要のアプローチを紹介します。メインタスクを 2 つのサブタスクに分割します。このために、人物セグメンテーションのためのセグメンテーションネットワークと、アルファマット予測のためのアルファ生成ネットワークを提案します。セグメンテーションネットワークは入力画像を受け取り、粗いセグメンテーションマップを生成しますが、アルファ生成ネットワークは同じ入力画像と、セグメンテーションネットワークによって生成された粗いセグメンテーションマップを使用してアルファマットを予測します。さらに、粗いセグメンテーションマップをダウンサンプリングし、残差ブロックに特徴表現を提供するセグメンテーションエンコーディングブロックを示します。さらに、より困難である可能性が高い被写体の境界のみに個別にペナルティを課す境界損失を提案し、ポートレートマットのための知覚損失も適応させます。提案されたシステムをトレーニングするために、2 つの異なる一般的なトレーニングデータセットを組み合わせて、データ量と多様性を改善し、推論時間におけるドメインシフトの問題に対処します。モデルを 3 つの異なるベンチマークデータセット、つまり、Adobe Image Matting データセット、Portrait Matting データセット、および Distinctions データセットでテストしました。提案された方法は、単一の入力を受け取る MODNet 方法よりも優れています。

Portrait matting is an important research problem with a wide range of applications, such as video conference app, image/video editing, and post-production. The goal is to predict an alpha matte that identifies the effect of each pixel on the foreground subject. Traditional approaches and most of the existing works utilized an additional input, e.g., trimap, background image, to predict alpha matte. However, providing additional input is not always practical. Besides, models are too sensitive to these additional inputs. In this paper, we introduce an additional input-free approach to perform portrait matting using Generative Adversarial Nets (GANs). We divide the main task into two subtasks. For this, we propose a segmentation network for the person segmentation and the alpha generation network for alpha matte prediction. While the segmentation network takes an input image and produces a coarse segmentation map, the alpha generation network utilizes the same input image as well as a coarse segmentation map that is produced by the segmentation network to predict the alpha matte. Besides, we present a segmentation encoding block to downsample the coarse segmentation map and provide feature representation to the residual block. Furthermore, we propose border loss to penalize only the borders of the subject separately which is more likely to be challenging and we also adapt perceptual loss for portrait matting. To train the proposed system, we combine two different popular training datasets to improve the amount of data as well as diversity to address domain shift problems in the inference time. We tested our model on three different benchmark datasets, namely Adobe Image Matting dataset, Portrait Matting dataset, and Distinctions dataset. The proposed method outperformed the MODNet method that also takes a single input.

updated: Mon Jun 14 2021 09:36:52 GMT+0000 (UTC)

published: Sun Jun 06 2021 18:53:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト