Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

Wen Liu; Zhixin Piao; Zhi Tu; Wenhan Luo; Lin Ma; Shenghua Gao

注意を払った液体ワーピングGAN：人間の画像合成のための統一されたフレームワーク

統一されたフレームワーク内で、人間の動きの模倣、外観の転送、新しいビューの合成など、人間の画像の合成に取り組んでいます。これは、モデルがトレーニングされると、これらすべてのタスクを処理するために使用できることを意味します。既存のタスク固有の方法は、主に2Dキーポイントを使用して人体構造を推定します。ただし、それらは位置情報を表現するだけであり、人のパーソナライズされた形状を特徴付けたり、手足の回転をモデル化したりすることはできません。この論文では、3Dボディメッシュ回復モジュールを使用して、ポーズと形状を解きほぐすことを提案します。関節の位置と回転をモデル化できるだけでなく、パーソナライズされた体の形を特徴づけることもできます。テクスチャ、スタイル、色、顔のアイデンティティなどのソース情報を保持するために、画像空間と特徴空間の両方のソース情報を合成された参照に伝播する、注意液体ワーピングブロックを備えた注意液体ワーピングGAN（AttLWB）を提案します。具体的には、ソースの特徴は、ソースのアイデンティティを適切に特徴付けるために、ノイズ除去畳み込みオートエンコーダによって抽出されます。さらに、提案された方法は、複数のソースからのより柔軟なワーピングをサポートできます。見えないソース画像の一般化能力をさらに向上させるために、ワンショット/数ショットの敵対的学習が適用されます。詳細には、最初に広範なトレーニングセットでモデルをトレーニングします。次に、自己監視方式で1枚/数枚の見えない画像によってモデルを微調整し、高解像度（512 x512および1024x 1024）の結果を生成します。また、人間の動きの模倣、外観の転送、および新しいビューの合成を評価するために、新しいデータセット、つまりiPERデータセットを構築します。広範な実験は、顔のアイデンティティ、形状の一貫性、および衣服の詳細を維持するという点で、私たちの方法の有効性を示しています。すべてのコードとデータセットはhttps://impersonator.org/work/impersonator-plus-plus.htmlで入手できます。

We tackle human image synthesis, including human motion imitation, appearance transfer, and novel view synthesis, within a unified framework. It means that the model, once being trained, can be used to handle all these tasks. The existing task-specific methods mainly use 2D keypoints to estimate the human body structure. However, they only express the position information with no abilities to characterize the personalized shape of the person and model the limb rotations. In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape. It can not only model the joint location and rotation but also characterize the personalized body shape. To preserve the source information, such as texture, style, color, and face identity, we propose an Attentional Liquid Warping GAN with Attentional Liquid Warping Block (AttLWB) that propagates the source information in both image and feature spaces to the synthesized reference. Specifically, the source features are extracted by a denoising convolutional auto-encoder for characterizing the source identity well. Furthermore, our proposed method can support a more flexible warping from multiple sources. To further improve the generalization ability of the unseen source images, a one/few-shot adversarial learning is applied. In detail, it firstly trains a model in an extensive training set. Then, it finetunes the model by one/few-shot unseen image(s) in a self-supervised way to generate high-resolution (512 x 512 and 1024 x 1024) results. Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis. Extensive experiments demonstrate the effectiveness of our methods in terms of preserving face identity, shape consistency, and clothes details. All codes and dataset are available on https://impersonator.org/work/impersonator-plus-plus.html.

updated: Wed Nov 18 2020 02:57:47 GMT+0000 (UTC)

published: Wed Nov 18 2020 02:57:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト