In this paper, we focus on devising a versatile framework for dense pixelwise prediction whose goal is to assign a discrete or continuous label to each pixel for an image. It is well-known that the reduced feature resolution due to repeated subsampling operations poses a serious challenge to Fully Convolutional Network (FCN) based models. In contrast to the commonly-used strategies, such as dilated convolution and encoder-decoder structure, we introduce the Flattening Module to produce high-resolution predictions without either removing any subsampling operations or building a complicated decoder module. In addition, the Flattening Module is lightweight and can be easily combined with any existing FCNs, allowing the model builder to trade off among model size, computational cost and accuracy by simply choosing different backbone networks. We empirically demonstrate the effectiveness of the proposed Flattening Module through competitive results in human pose estimation on MPII, semantic segmentation on PASCAL-Context and object detection on PASCAL VOC. We hope that the proposed approach can serve as a simple and strong alternative of current dominant dense pixelwise prediction frameworks.
updated: Fri Nov 08 2019 02:47:21 GMT+0000 (UTC)
published: Sun Sep 22 2019 08:05:04 GMT+0000 (UTC)