3D shape reconstruction from a single image has been a long-standing problem in computer vision. Recent advances have led to 3D representation learning, wherein pixel-aligned 3D reconstruction methods show impressive performance. However, it is normally hard to exploit meaningful local image features to describe 3D point samplings from the aligned pixels when large variations of occlusions, views, and appearances exist. In this paper, we study a general kernel to encode local image features with considering geometric relationships of point samplings from the underlying surfaces. The kernel is derived from the proposed spatial pattern, in a way the kernel points are obtained as the 2D projections of a number of 3D pattern points around a sampling. Supported by the spatial pattern, the 2D kernel encodes geometric information that is essential for 3D reconstruction tasks, while traditional 2D kernels mainly consider appearance information. Furthermore, to enable the network to discover more adaptive spatial patterns for further capturing non-local contextual information, the spatial pattern is devised to be deformable. Experimental results on both synthetic datasets and real datasets demonstrate the superiority of the proposed method.
updated: Sat Oct 02 2021 09:23:17 GMT+0000 (UTC)
published: Sun Jun 06 2021 10:35:31 GMT+0000 (UTC)