A General Descent Aggregation Framework for Gradient-based Bi-level Optimization

Risheng Liu; Pan Mu; Xiaoming Yuan; Shangzhi Zeng; Jin Zhang

勾配ベースのバイレベル最適化のための一般的な降下集約フレームワーク

近年、機械学習とコンピュータービジョンの分野でバイレベル最適化（BLO）の問題を解決するために、さまざまな勾配ベースの方法が開発されています。ただし、これらの既存のアプローチの理論的正確性と実際的な有効性は、実際のアプリケーションではほとんど満たすことができないいくつかの制限条件（たとえば、低レベルシングルトン、LLS）に常に依存しています。さらに、以前の文献は、特定の反復戦略に基づいた理論的結果のみを証明しているため、さまざまな勾配ベースのBLOの収束動作を均一に分析するための一般的なレシピが不足しています。この作業では、楽観的な2レベルの観点からBLOを定式化し、上記の問題に部分的に対処するために、Bi-level Descent Aggregation（BDA）という名前の新しい勾配ベースのアルゴリズムフレームワークを確立します。具体的には、BDAはモジュール化された構造を提供し、上位レベルと下位レベルの両方のサブ問題を階層的に集約して、2レベルの反復ダイナミクスを生成します。理論的には、一般的な収束分析テンプレートを確立し、勾配ベースのBLOメソッドの本質的な理論的特性を調査するための新しい証明レシピを導き出します。さらに、この作業では、さまざまな最適化シナリオでのBDAの収束動作を体系的に調査します。つまり、近似サブ問題の解決から返されるさまざまなソリューション品質（つまり、グローバル/ローカル/定常ソリューション）を検討します。広範な実験は、私たちの理論的結果を正当化し、ハイパーパラメータ最適化とメタ学習タスクのために提案されたアルゴリズムの優位性を示しています。

In recent years, a variety of gradient-based methods have been developed to solve Bi-Level Optimization (BLO) problems in machine learning and computer vision areas. However, the theoretical correctness and practical effectiveness of these existing approaches always rely on some restrictive conditions (e.g., Lower-Level Singleton, LLS), which could hardly be satisfied in real-world applications. Moreover, previous literature only proves theoretical results based on their specific iteration strategies, thus lack a general recipe to uniformly analyze the convergence behaviors of different gradient-based BLOs. In this work, we formulate BLOs from an optimistic bi-level viewpoint and establish a new gradient-based algorithmic framework, named Bi-level Descent Aggregation (BDA), to partially address the above issues. Specifically, BDA provides a modularized structure to hierarchically aggregate both the upper- and lower-level subproblems to generate our bi-level iterative dynamics. Theoretically, we establish a general convergence analysis template and derive a new proof recipe to investigate the essential theoretical properties of gradient-based BLO methods. Furthermore, this work systematically explores the convergence behavior of BDA in different optimization scenarios, i.e., considering various solution qualities (i.e., global/local/stationary solution) returned from solving approximation subproblems. Extensive experiments justify our theoretical results and demonstrate the superiority of the proposed algorithm for hyper-parameter optimization and meta-learning tasks.

updated: Mon Oct 11 2021 06:40:46 GMT+0000 (UTC)

published: Tue Feb 16 2021 06:58:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト