The Unreasonable Effectiveness of the Final Batch Normalization Layer

Veysel Kocaman; Ofer M. Shir; Thomas Baeck

最終バッチ正規化レイヤーの不当な有効性

初期段階の病気の兆候が農業やヘルスケアなどの現実の領域で記録されることはめったにありませんが、その時点では正確な識別が重要です。複雑な機能を含むこのタイプの高度に不均衡な分類問題では、その強力な検出機能のために深層学習（DL）が非常に必要です。同時に、DLは実際には少数派のクラスよりも多数派を支持することが観察されており、その結果、対象となる初期段階の兆候の不正確な検出に苦しんでいます。この作業では、Kocaman et al。、2020によって行われた研究を拡張し、最終的なBN層をソフトマックス出力層の前に配置すると、高度に不均衡な画像分類問題にかなりの影響を与えるだけでなく、不確実性の尺度としてのsoftmax出力。この現在の研究は、以下の発見に関する追加の仮説と報告に取り組んでいます。（i）非常に不均衡な設定で最後のBN層を追加した後のパフォーマンスの向上は、推論でこの追加のBN層を削除した後でも達成できます。（ii）最終的なBN層によって得られた進行がピークに達する不均衡率には特定のしきい値があります。（iii）バッチサイズも役割を果たし、最終的なBNアプリケーションの結果に影響を与えます。（iv）BNアプリケーションの影響は、他のデータセットでも、はるかに単純なニューラルアーキテクチャを利用する場合にも再現可能です。（v）報告されたBN効果は、単一の多数派クラスと複数の少数派クラスごとにのみ発生します。つまり、2つの多数派クラスがある場合、改善は見られません。そして最後に、（vi）シグモイド活性化でこのBN層を利用することは、非常に不均衡な画像分類タスクを処理するときにほとんど影響を与えません。

Early-stage disease indications are rarely recorded in real-world domains, such as Agriculture and Healthcare, and yet, their accurate identification is critical in that point of time. In this type of highly imbalanced classification problems, which encompass complex features, deep learning (DL) is much needed because of its strong detection capabilities. At the same time, DL is observed in practice to favor majority over minority classes and consequently suffer from inaccurate detection of the targeted early-stage indications. In this work, we extend the study done by Kocaman et al., 2020, showing that the final BN layer, when placed before the softmax output layer, has a considerable impact in highly imbalanced image classification problems as well as undermines the role of the softmax outputs as an uncertainty measure. This current study addresses additional hypotheses and reports on the following findings: (i) the performance gain after adding the final BN layer in highly imbalanced settings could still be achieved after removing this additional BN layer in inference; (ii) there is a certain threshold for the imbalance ratio upon which the progress gained by the final BN layer reaches its peak; (iii) the batch size also plays a role and affects the outcome of the final BN application; (iv) the impact of the BN application is also reproducible on other datasets and when utilizing much simpler neural architectures; (v) the reported BN effect occurs only per a single majority class and multiple minority classes i.e., no improvements are evident when there are two majority classes; and finally, (vi) utilizing this BN layer with sigmoid activation has almost no impact when dealing with a strongly imbalanced image classification tasks.

updated: Sat Sep 18 2021 21:19:31 GMT+0000 (UTC)

published: Sat Sep 18 2021 21:19:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト