To synthesize high-quality person images with arbitrary poses is challenging. In this paper, we propose a novel Multi-scale Conditional Generative Adversarial Networks (MsCGAN), aiming to convert the input conditional person image to a synthetic image of any given target pose, whose appearance and the texture are consistent with the input image. MsCGAN is a multi-scale adversarial network consisting of two generators and two discriminators. One generator transforms the conditional person image into a coarse image of the target pose globally, and the other is to enhance the detailed quality of the synthetic person image through a local reinforcement network. The outputs of the two generators are then merged into a synthetic, discriminant and high-resolution image. On the other hand, the synthetic image is downsampled to multiple resolutions as the input to multi-scale discriminator networks. The proposed multi-scale generators and discriminators handling different levels of visual features can benefit to synthesizing high-resolution person images with realistic appearance and texture. Experiments are conducted on the Market-1501 and DeepFashion datasets to evaluate the proposed model, and both qualitative and quantitative results demonstrate the superior performance of the proposed MsCGAN.
updated: Thu Mar 05 2020 16:19:24 GMT+0000 (UTC)
published: Fri Oct 19 2018 15:04:13 GMT+0000 (UTC)