Batch Group Normalization

Xiao-Yun Zhou; Jiacheng Sun; Nanyang Ye; Xu Lan; Qijun Luo; Bo-Lin Lai; Pedro Esperanca; Guang-Zhong Yang; Zhenguo Li

バッチグループの正規化

ディープ畳み込みニューラルネットワーク（DCNN）は、トレーニングが難しく、時間がかかります。正規化は効果的な解決策の1つです。以前の正規化方法の中で、バッチ正規化（BN）は、中規模および大規模のバッチサイズで良好に機能し、複数のビジョンタスクに対して優れた一般化可能性を備えていますが、小規模なバッチサイズではパフォーマンスが大幅に低下します。この論文では、BNが非常に大きなバッチサイズ、つまりワーカーあたり128イメージ、つまりGPUでも飽和することを発見し、小さな/非常に大きなバッチサイズでのBNの劣化/飽和はノイズ/混乱によって引き起こされることを提案します統計計算。したがって、新しいトレーニング可能なパラメーターを追加したり、複数層または複数の反復情報を使用したり、追加の計算を導入したりすることなく、バッチグループ正規化（BGN）を提案して、小さい/極端に大きいバッチサイズでのBNのノイズの多い/混乱した統計計算を解決します。補正するチャネル、高さ、幅の寸法。グループ正規化（GN）のグループ手法が使用され、ハイパーパラメーターGが統計計算に使用される特徴インスタンスの数を制御するために使用されるため、さまざまなバッチサイズに対してノイズの多い統計や混乱した統計を提供しません。 BGNは、画像分類、ニューラルアーキテクチャ検索（NAS）、敵対者を含む幅広い視覚タスクにわたって、BN、インスタンス正規化（IN）、レイヤー正規化（LN）、GN、および位置正規化（PN）を一貫して上回っていることを経験的に示しています。学習、少数ショット学習（FSL）および教師なしドメイン適応（UDA）は、その優れたパフォーマンス、バッチサイズに対する堅牢な安定性、および幅広い一般化可能性を示しています。たとえば、バッチサイズが2のImageNetでResNet-50をトレーニングする場合、BNは66.512％のTop1精度を達成し、BGNは76.096％を達成し、顕著な改善が見られます。

Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to train. Normalization is one of the effective solutions. Among previous normalization methods, Batch Normalization (BN) performs well at medium and large batch sizes and is with good generalizability to multiple vision tasks, while its performance degrades significantly at small batch sizes. In this paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images per worker, i.e., GPU, as well and propose that the degradation/saturation of BN at small/extreme large batch sizes is caused by noisy/confused statistic calculation. Hence without adding new trainable parameters, using multiple-layer or multi-iteration information, or introducing extra computation, Batch Group Normalization (BGN) is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes with introducing the channel, height and width dimension to compensate. The group technique in Group Normalization (GN) is used and a hyper-parameter G is used to control the number of feature instances used for statistic calculation, hence to offer neither noisy nor confused statistic for different batch sizes. We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and wide generalizability. For example, for training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1 accuracy of 66.512% while BGN achieves 76.096% with notable improvement.

updated: Wed Dec 09 2020 01:26:51 GMT+0000 (UTC)

published: Fri Dec 04 2020 18:57:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト