An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives

Qi Qi; Zhishuai Guo; Yi Xu; Rong Jin; Tianbao Yang

非凸目的をもつ分布的にロバストな最適化のクラスのためのオンライン法

この論文では、ニューラルネットワークのロバスト性を改善するための機械学習に重要なアプリケーションを持つ、非凸目的を持つ分布ロバスト最適化（DRO）のクラスを解決するための実用的なオンライン方法を提案します。文献では、DROを解くためのほとんどの方法は、確率的プライマルデュアル法に基づいています。ただし、DROのプライマリデュアルメソッドにはいくつかの欠点があります。（1）データのサイズに対応する高次元のデュアル変数を操作するには時間がかかります。（2）データが順番に来るオンライン学習には不向きです。これらの問題に対処するために、二重変数でKL発散正則化を行うDROのクラスを検討し、最小最大問題を構成最小化問題に変換し、大きなミニバッチサイズを必要とせずに実用的な双対性のないオンライン確率論的手法を提案します。。目的のPolyak- \ L ojasiewicz（PL）条件がある場合とない場合で、提案された方法の最先端の複雑さを確立します。大規模な深層学習タスクに関する実証的研究は、（i）私たちの方法がベースラインの方法よりも2倍以上トレーニングをスピードアップし、約265Kの画像を含む大規模なデータセットでトレーニング時間を節約できることを示しています。（ii）不均衡なデータセットに対する経験的リスク最小化（ERM）に対するDROの最高のパフォーマンスを検証します。独立して興味深いことに、提案された方法は、最先端の複雑さを伴う確率的構成問題のファミリーを解決するためにも使用できます。

In this paper, we propose a practical online method for solving a class of distributionally robust optimization (DRO) with non-convex objectives, which has important applications in machine learning for improving the robustness of neural networks. In the literature, most methods for solving DRO are based on stochastic primal-dual methods. However, primal-dual methods for DRO suffer from several drawbacks: (1) manipulating a high-dimensional dual variable corresponding to the size of data is time expensive; (2) they are not friendly to online learning where data is coming sequentially. To address these issues, we consider a class of DRO with an KL divergence regularization on the dual variables, transform the min-max problem into a compositional minimization problem, and propose practical duality-free online stochastic methods without requiring a large mini-batch size. We establish the state-of-the-art complexities of the proposed methods with and without a Polyak-\L ojasiewicz (PL) condition of the objective. Empirical studies on large-scale deep learning tasks (i) demonstrate that our method can speed up the training by more than 2 times than baseline methods and save days of training time on a large-scale dataset with ∼ 265K images, and (ii) verify the supreme performance of DRO over Empirical Risk Minimization (ERM) on imbalanced datasets. Of independent interest, the proposed method can be also used for solving a family of stochastic compositional problems with state-of-the-art complexities.

updated: Fri Nov 12 2021 15:52:00 GMT+0000 (UTC)

published: Wed Jun 17 2020 20:19:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト