Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian   Reparameterization offers Significant Performance and Efficiency Gains

Sathya N. Ravi; Abhay Venkatesh; Glenn Moo Fung; Vikas Singh

ラグランジアン再パラメーター化による非分解性データ依存のレギュライザーの最適化により、パフォーマンスと効率が大幅に向上します

Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains

データ依存の正則化は、機械学習のさまざまな問題に役立つことが知られています。多くの場合、これらの正則化器は、有限の数の項の合計、例えば個々の例ごとの項の合計に簡単に分解することはできません。 $ F_ \ beta $メジャー、ROC曲線下面積（AUCROC）、および固定リコールでの精度（P @ R）は、多くのアプリケーションで使用される顕著な例です。ほとんどの中規模から大規模のデータセットでは、スケーラビリティの問題により、こうしたレギュライザーの利点を活用する能力が大幅に制限されることがわかります。重要なことは、最近のいくつかの進歩にもかかわらず、主要な技術的障害は、そのような目標が逆伝播手順を介して最適化するのが依然として難しいということです。この問題に対する効率的な汎用戦略はまだとらえどころのないままですが、このホワイトペーパーでは、アプリケーションに関連する多くのデータ依存の非分解可能な正則化器について、最小限のコードレベルの変更でかなりの効率向上が可能であることを示します。つまり、特別なツールや数値スキームは必要ありません。この手順では、パラメーターの再設定とそれに続く部分的な二重化を行います。これにより、安価な射影演算子を含む定式化が行われます。アルゴリズムのランタイムおよび収束特性の詳細な分析を提示します。実験的側面では、このスキームを直接使用すると、MSCOCOスタッフセグメンテーションデータセットについて報告されている最先端のIOU尺度が大幅に改善されることを示しています。

Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_\beta$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in applications, sizable gains in efficiency are possible with minimal code-level changes; in other words, no specialized tools or numerical schemes are needed. Our procedure involves a reparameterization followed by a partial dualization -- this leads to a formulation that has provably cheap projection operators. We present a detailed analysis of runtime and convergence properties of our algorithm. On the experimental side, we show that a direct use of our scheme significantly improves the state of the art IOU measures reported for MSCOCO Stuff segmentation dataset.

updated: Thu Sep 26 2019 21:19:30 GMT+0000 (UTC)

published: Thu Sep 26 2019 21:19:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト