This paper tackles the problem of video object segmentation. We are specifically concerned with the task of segmenting all pixels of a target object in all frames, given the annotation mask in the first frame. Even when such annotation is available this remains a challenging problem because of the changing appearance and shape of the object over time. In this paper, we tackle this task by formulating it as a meta-learning problem, where the base learner grasping the semantic scene understanding for a general type of objects, and the meta learner quickly adapting the appearance of the target object with a few examples. Our proposed meta-learning method uses a closed form optimizer, the so-called "ridge regression", which has been shown to be conducive for fast and better training convergence. Moreover, we propose a mechanism, named "block splitting", to further speed up the training process as well as to reduce the number of learning parameters. In comparison with the-state-of-the art methods, our proposed framework achieves significant boost up in processing speed, while having very competitive performance compared to the best performing methods on the widely used datasets.
updated: Sat Sep 28 2019 08:20:04 GMT+0000 (UTC)
published: Sat Sep 28 2019 08:20:04 GMT+0000 (UTC)