Branch-and-Pruning Optimization Towards Global Optimality in Deep Learning

Yuanwei Wu; Ziming Zhang; Guanghui Wang

ディープラーニングのグローバル最適化に向けたブランチアンドプルーニングの最適化

最近、ディープラーニング（DL）のグローバルな最適性を理解することがますます注目されています。ただし、従来のDLソルバーは、このようなグローバルな最適性を追求するために意図的に開発されたものではありません。この論文では、分岐と剪定を介して深いモデルをグローバルに最適化するための新しい近似アルゴリズム、BPGradを提案します。提案されたBPGradアルゴリズムは、DLのリプシッツ連続性の仮定に基づいており、その結果、以前の更新の履歴を考慮して、現在の勾配のステップサイズを適応的に決定できます。理論的には、より小さなステップでグローバルな最適性を達成することはできません。このような分岐と剪定の手順を繰り返すことにより、有限の反復内でグローバルな最適性を見つけることができることを証明します。経験的に、DL用のBPGradに基づく効率的な適応ソルバーも提案されており、オブジェクトの認識、検出、およびセグメンテーションのタスクにおいて、Adagrad、Adadelta、RMSProp、Adamなどの従来のDLソルバーよりも優れています。コードはhttps://github.com/RyanCV/BPGradで入手できます。

It has been attracting more and more attention to understand the global optimality in deep learning (DL) recently. However, conventional DL solvers, have not been developed intentionally to seek for such global optimality. In this paper, we propose a novel approximation algorithm, BPGrad, towards optimizing deep models globally via branch and pruning. The proposed BPGrad algorithm is based on the assumption of Lipschitz continuity in DL, and as a result, it can adaptively determine the step size for the current gradient given the history of previous updates, wherein theoretically no smaller steps can achieve the global optimality. We prove that, by repeating such a branch-and-pruning procedure, we can locate the global optimality within finite iterations. Empirically an efficient adaptive solver based on BPGrad for DL is proposed as well, and it outperforms conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the tasks of object recognition, detection, and segmentation. The code is available at https://github.com/RyanCV/BPGrad.

updated: Mon Apr 05 2021 00:43:03 GMT+0000 (UTC)

published: Mon Apr 05 2021 00:43:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト