Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Feihu Huang; Shangqian Gao; Jian Pei; Heng Huang

ミニからミニマックス最適化までの加速されたゼロ次および一次運動量法

この論文では、非凸ミニ最適化とミニマックス最適化の両方のための加速されたゼロ次および一次運動量法のクラスを提案します。具体的には、確率的ミニ最適化問題を解くために、新しい加速ゼロ次運動量（Acc-ZOM）法を提案します。 Acc-ZOM法が、ϵ停留点を見つけるためにO（d ^ 3 / 4ϵ ^ -3）のより低いクエリ複雑度を達成し、O（d ^ 1/4の係数で最もよく知られている結果を改善することを証明します。）ここで、dはパラメーターの次元を示します。特に、Acc-ZOMは、既存の0次確率的アルゴリズムで必要とされる大きなバッチを必要としません。同時に、ブラックボックスのミニマックス最適化のための加速されたゼロ次運動量降下上昇（Acc-ZOMDA）法を提案します。 Acc-ZOMDA法が、ϵ停留点を見つけるための大きなバッチなしでO（（d_1 + d_2）κ_y^ 3ϵ ^ -3）の最もよく知られているクエリの複雑さに到達することを証明します。ここで、d_1とd_2は最適化パラメーターの次元を示します。 κ_yは条件数です。さらに、ホワイトボックスミニマックス問題を解決するための加速一次運動量降下上昇（Acc-MDA）法を提案し、バッチサイズbが与えられた場合にO（κ_y^ 2.5ϵ ^ -3）のより低い勾配複雑度を達成することを証明します。 =κ_y^ 4は、ϵ-停留点を見つけるためのものです。これにより、最もよく知られている結果がO（κ_y^ 1/2）の係数で改善されます。ディープニューラルネットワーク（DNN）へのブラックボックスの敵対的攻撃と中毒攻撃に関する広範な実験結果は、アルゴリズムの効率を示しています。

In the paper, we propose a class of accelerated zeroth-order and first-order momentum methods for both nonconvex mini-optimization and minimax-optimization. Specifically, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method to solve stochastic mini-optimization problems. We prove that the Acc-ZOM method achieves a lower query complexity of O(d^3/4ϵ^-3) for finding an ϵ-stationary point, which improves the best known result by a factor of O(d^1/4) where d denotes the parameter dimension. In particular, the Acc-ZOM does not need large batches that are required in the existing zeroth-order stochastic algorithms. At the same time, we propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method for black-box minimax-optimization. We prove that the Acc-ZOMDA method reaches the best known query complexity of O((d_1+d_2)κ_y^3ϵ^-3) without large batches for finding an ϵ-stationary point, where d_1 and d_2 denote dimensions of optimization parameters and κ_y is condition number. Moreover, we propose an accelerated first-order momentum descent ascent (Acc-MDA) method for solving white-box minimax problems, and prove that it achieves a lower gradient complexity of O(κ_y^2.5ϵ^-3) given batch size b=κ_y^4 for finding an ϵ-stationary point, which improves the best known result by a factor of O(κ_y^1/2). Extensive experimental results on the black-box adversarial attack to deep neural networks (DNNs) and poisoning attack demonstrate the efficiency of our algorithms.

updated: Mon Sep 13 2021 18:55:33 GMT+0000 (UTC)

published: Tue Aug 18 2020 22:19:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト