Universal Adder Neural Networks

Hanting Chen; Yunhe Wang; Chang Xu; Chao Xu; Chunjing Xu; Tong Zhang

Universal Adder ニューラルネットワーク

安価な加算演算と比較して、乗算演算は計算がはるかに複雑です。ディープニューラルネットワークで広く使用されている畳み込みは、入力特徴フィルターと畳み込みフィルター間の類似性を測定するための正確な相互相関です。これには、浮動小数点値間の大量の乗算が含まれます。この論文では、ディープニューラルネットワーク、特に畳み込みニューラルネットワーク (CNN) でのこれらの大規模な乗算を、計算コストを削減するためにはるかに安価な加算と交換する加算ネットワーク (AdderNets) を紹介します。 AdderNets では、フィルターと入力特徴の間の ℓ_1 ノルム距離を出力応答として使用します。ニューラルネットワークの最適化に対するこの新しい類似度測定の影響は、徹底的に分析されています。より良いパフォーマンスを達成するために、ℓ_p-norm を調査することにより、AdderNets の特別なトレーニングアプローチを開発します。次に、各ニューロンの勾配の大きさに応じて AdderNets のトレーニング手順を強化する適応学習率戦略を提案します。その結果、提案された AdderNets は、畳み込み層での乗算なしで、ImageNet データセットで ResNet-50 を使用して、75.7% のトップ 1 精度と 92.3% のトップ 5 精度を達成できます。さらに、単一の隠れ層 AdderNet と ReLU 活性化関数を備えた幅に制限された深層 AdderNet の両方がユニバーサル関数近似であることを示すことにより、AdderNet の理論的基盤を構築します。これらの結果は、より複雑な乗算ユニットを使用した従来のニューラルネットワークの結果と一致しています。単一の隠れ層を持つ AdderNets の近似値も示されています。

Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the ℓ_1-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special training approach for AdderNets by investigating the ℓ_p-norm. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolutional layer. Moreover, we develop a theoretical foundation for AdderNets, by showing that both the single hidden layer AdderNet and the width-bounded deep AdderNet with ReLU activation functions are universal function approximators. These results match those of the traditional neural networks using the more complex multiplication units. An approximation bound for AdderNets with a single hidden layer is also presented.

updated: Thu Jun 03 2021 02:00:56 GMT+0000 (UTC)

published: Sat May 29 2021 04:02:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト