In this paper, we explore techniques centered around periodic sampling of model weights that provide convergence improvements on gradient update methods (vanilla SGD, Momentum, Adam) for a variety of vision problems (classification, detection, segmentation). Importantly, our algorithms provide better, faster and more robust convergence and training performance with only a slight increase in computation time. Our techniques are independent of the neural network model, gradient optimization methods or existing optimal training policies and converge in a less volatile fashion with performance improvements that are approximately monotonic. We conduct a variety of experiments to quantify these improvements and identify scenarios where these techniques could be more useful.
updated: Thu Mar 19 2020 21:37:12 GMT+0000 (UTC)
published: Tue May 14 2019 18:00:23 GMT+0000 (UTC)