Modelling long-range dependencies is critical for complex scene understanding tasks such as semantic segmentation and object detection. Although CNNs have excelled in many computer vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph is beneficial for such modelling, however, its computational overhead is prohibitive. We propose a dynamic graph message passing network, based on the message passing neural network framework, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we then dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on three different tasks and backbone architectures. Our approach also outperforms fully-connected graphs while using substantially fewer floating point operations and parameters.
updated: Sun Jun 14 2020 10:35:58 GMT+0000 (UTC)
published: Mon Aug 19 2019 17:46:34 GMT+0000 (UTC)