Attentional Biased Stochastic Gradient for Imbalanced Classification

Qi Qi; Yi Xu; Rong Jin; Wotao Yin; Tianbao Yang

不均衡な分類のための注意バイアス確率的勾配

この論文では、元のタイトルは「不均衡な分類のためのロバストな重み付けを使用したMomentum SGD」であり、深層学習におけるデータの不均衡の問題に対処するためのシンプルで効果的な方法（ABSGD）を示します。私たちの方法は、注意メカニズムを活用してミニバッチの各勾配に個別の重要度の重みを割り当てる、運動量SGDの単純な変更です。個別のバランスの取れた検証データでメタ学習することによって個々の重みを学習する既存の個々の重み方法とは異なり、私たちの重みスキームは自己適応型であり、分布的にロバストな最適化に基づいています。サンプリングされたデータの重みは、データのスケーリングされた損失値の指数に系統的に比例します。スケーリング係数は、情報が正規化された分布的にロバストな最適化のフレームワークの正規化パラメーターとして解釈されます。特徴抽出層の学習と分類器層の学習のバランスをとるために、スケーリング係数にステップ減衰戦略を採用しています。各反復で3つの異なるポイントでミニバッチ確率的勾配を計算するために3つの後方伝播を必要とする既存のメタ学習方法と比較して、この方法は、標準の深層学習方法と同様に、各反復で1つの後方伝播のみでより効率的です。既存のクラスレベルの重み付けスキームと比較して、私たちの方法は、既存のクラスレベルの重み付けスキームと組み合わせてオフライン学習のパフォーマンスをさらに向上させながら、クラスの事前知識がなくてもオンライン学習に適用できます。いくつかのベンチマークデータセットに関する実証研究でも、提案された方法の有効性が実証されています

In this paper~The original title is "Momentum SGD with Robust Weighting For Imbalanced Classification", we present a simple yet effective method (ABSGD) for addressing the data imbalance issue in deep learning. Our method is a simple modification to momentum SGD where we leverage an attentional mechanism to assign an individual importance weight to each gradient in the mini-batch. Unlike existing individual weighting methods that learn the individual weights by meta-learning on a separate balanced validation data, our weighting scheme is self-adaptive and is grounded in distributionally robust optimization. The weight of a sampled data is systematically proportional to exponential of a scaled loss value of the data, where the scaling factor is interpreted as the regularization parameter in the framework of information-regularized distributionally robust optimization. We employ a step damping strategy for the scaling factor to balance between the learning of feature extraction layers and the learning of the classifier layer. Compared with exiting meta-learning methods that require three backward propagations for computing mini-batch stochastic gradients at three different points at each iteration, our method is more efficient with only one backward propagation at each iteration as in standard deep learning methods. Compared with existing class-level weighting schemes, our method can be applied to online learning without any knowledge of class prior, while enjoying further performance boost in offline learning combined with existing class-level weighting schemes. Our empirical studies on several benchmark datasets also demonstrate the effectiveness of our proposed method

updated: Sun Dec 13 2020 03:41:52 GMT+0000 (UTC)

published: Sun Dec 13 2020 03:41:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト