Attentional-Biased Stochastic Gradient Descent

Qi Qi; Yi Xu; Rong Jin; Wotao Yin; Tianbao Yang

注意バイアス付き確率的勾配降下法

この論文では、深層学習におけるデータの不均衡またはラベルノイズの問題に対処するための、シンプルかつ効果的な証明可能な方法 (ABSGD と呼ばれる) を紹介します。私たちの方法は、ミニバッチ内の各サンプルに個別の重要度の重みを割り当てるモメンタム SGD を簡単に変更したものです。サンプリングされたデータの個別レベルの重みは、データのスケーリングされた損失値の指数関数に系統的に比例します。スケーリング係数は、分布ロバスト最適化 (DRO) のフレームワークにおける正則化パラメーターとして解釈されます。スケーリング係数が正か負かに応じて、ABSGD はそれぞれ情報正規化された最小-最大 DRO 問題または最小-最小 DRO 問題の定常点に収束することが保証されます。既存のクラスレベルの重み付けスキームと比較して、私たちの方法は各クラス内の個々の例間の多様性を捉えることができます。ミニバッチの確率的勾配を計算するために 3 つの逆伝播を必要とするメタ学習を使用した既存の個別レベルの重み付け手法と比較して、私たちの手法は、標準的な深層学習手法と同様に、反復ごとに 1 つの逆伝播のみで効率的です。 ABSGD は、追加コストなしで他の堅牢な損失と組み合わせることができる柔軟性があります。いくつかのベンチマークデータセットに関する実証研究により、提案された手法の有効性が実証されました。コードは、https://github.com/qiqi-helloworld/ABSGD/ で入手できます。

In this paper, we present a simple yet effective provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. The individual-level weight of sampled data is systematically proportional to the exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of distributionally robust optimization (DRO). Depending on whether the scaling factor is positive or negative, ABSGD is guaranteed to converge to a stationary point of an information-regularized min-max or min-min DRO problem, respectively. Compared with existing class-level weighting schemes, our method can capture the diversity between individual examples within each class. Compared with existing individual-level weighting methods using meta-learning that require three backward propagations for computing mini-batch stochastic gradients, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. ABSGD is flexible enough to combine with other robust losses without any additional cost. Our empirical studies on several benchmark datasets demonstrate the effectiveness of the proposed method.Code is available at:https://github.com/qiqi-helloworld/ABSGD/

updated: Thu Jun 08 2023 05:58:43 GMT+0000 (UTC)

published: Sun Dec 13 2020 03:41:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト