Visual objects are composed of a recursive hierarchy of perceptual wholes and parts, whose properties, such as shape, reflectance, and color, constitute a hierarchy of intrinsic causal factors of object appearance. However, object appearance is the compositional consequence of both an object's intrinsic and extrinsic causal factors, where the extrinsic causal factors are related to illumination, and imaging conditions. Therefore, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object image formation, and subsumes multilinear block tensor decomposition as a special case. The resulting object representation is an interpretable combinatorial choice of wholes' and parts' representations that renders object recognition robust to occlusion and reduces training data requirements. We demonstrate ourapproach in the context of face recognition by training on an extremely reduced dataset of synthetic images, and report encouragingface verification results on two datasets - the Freiburg dataset, andthe Labeled Face in the Wild (LFW) dataset consisting of real world images, thus, substantiating the suitability of our approach for data starved domains.
updated: Wed Jan 01 2020 06:23:19 GMT+0000 (UTC)
published: Mon Nov 11 2019 11:03:53 GMT+0000 (UTC)