NoMorelization: Building Normalizer-Free Models from a Sample's Perspective

Chang Liu; Yuwen Yang; Yue Ding; Hongtao Lu

NoMorelization: サンプルの観点からノーマライザーを使用しないモデルを構築する

正規化層は、深層学習モデルの基本構成の 1 つになりましたが、計算の非効率性、解釈の難しさ、および一般性の低さにまだ悩まされています。サンプルの観点から、最近のノーマライゼーションおよびノーマライザーを使用しない研究作業をより深く理解した後、問題はサンプリングノイズと不適切な事前仮定にあるという事実を明らかにします。この論文では、「NoMorelization」と呼ばれる、正規化に代わるシンプルで効果的な方法を提案します。 NoMorelization は、2 つのトレーニング可能なスカラーとゼロ中心のノイズインジェクターで構成されます。実験結果は、NoMorelization が深層学習の一般的なコンポーネントであり、さまざまなタスク (識別タスクや生成タスクなど) に取り組むためのさまざまなモデルパラダイム (畳み込みベースのモデルや注意ベースのモデルなど) に適していることを示しています。既存の主流のノーマライザー (BN、LN、IN など) や最先端のノーマライザーを使用しない方法と比較すると、NoMorelization は速度と精度の最適なトレードオフを示しています。

The normalizing layer has become one of the basic configurations of deep learning models, but it still suffers from computational inefficiency, interpretability difficulties, and low generality. After gaining a deeper understanding of the recent normalization and normalizer-free research works from a sample's perspective, we reveal the fact that the problem lies in the sampling noise and the inappropriate prior assumption. In this paper, we propose a simple and effective alternative to normalization, which is called "NoMorelization". NoMorelization is composed of two trainable scalars and a zero-centered noise injector. Experimental results demonstrate that NoMorelization is a general component for deep learning and is suitable for different model paradigms (e.g., convolution-based and attention-based models) to tackle different tasks (e.g., discriminative and generative tasks). Compared with existing mainstream normalizers (e.g., BN, LN, and IN) and state-of-the-art normalizer-free methods, NoMorelization shows the best speed-accuracy trade-off.

updated: Thu Oct 13 2022 12:04:24 GMT+0000 (UTC)

published: Thu Oct 13 2022 12:04:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト