AdaBin: Improving Binary Neural Networks with Adaptive Binary Sets

Zhijun Tu; Xinghao Chen; Pengju Ren; Yunhe Wang

AdaBin: 適応バイナリセットによるバイナリニューラルネットワークの改善

この論文では、重みと活性化の両方が 1 ビット値にバイナリ化されるバイナリニューラルネットワーク (BNN) について研究し、メモリ使用量と計算の複雑さを大幅に削減します。最新のディープニューラルネットワークは、精度の理由から複雑なアーキテクチャを備えた洗練された設計であるため、重みと活性化の分布の多様性は非常に高くなります。したがって、従来の符号関数は、BNN の完全精度値を効果的に 2 値化するためにうまく使用できません。この目的のために、固定セット（つまり、｛-1 , +1})。このようにして、提案された方法は、さまざまな分布によりよく適合し、二値化された特徴の表現能力を高めることができます。実際には、1 ビット値の中心位置と距離を使用して、新しいバイナリ量子化関数を定義します。重みについては、バイナリ分布の対称中心を実数値分布に合わせ、それらのカルバック・ライブラー発散を最小化するための等化方法を提案します。一方、勾配ベースの最適化手法を導入して、アクティベーション用のこれら 2 つのパラメーターを取得します。これらのパラメーターは、エンドツーエンドの方法で共同でトレーニングされます。ベンチマークモデルとデータセットに関する実験結果は、提案された AdaBin が最先端のパフォーマンスを達成できることを示しています。たとえば、ResNet-18 アーキテクチャを使用して ImageNet で 66.4% のトップ 1 精度を取得し、SSD300 を使用して PASCAL VOC で 69.4 mAP を取得します。

This paper studies the Binary Neural Networks (BNNs) in which weights and activations are both binarized into 1-bit values, thus greatly reducing the memory usage and computational complexity. Since the modern deep neural networks are of sophisticated design with complex architecture for the accuracy reason, the diversity on distributions of weights and activations is very high. Therefore, the conventional sign function cannot be well used for effectively binarizing full-precision values in BNNs. To this end, we present a simple yet effective approach called AdaBin to adaptively obtain the optimal binary sets ｛b_1, b_2｝ (b_1, b_2∈R) of weights and activations for each layer instead of a fixed set (i.e., ｛-1, +1｝). In this way, the proposed method can better fit different distributions and increase the representation ability of binarized features. In practice, we use the center position and distance of 1-bit values to define a new binary quantization function. For the weights, we propose an equalization method to align the symmetrical center of binary distribution to real-valued distribution, and minimize the Kullback-Leibler divergence of them. Meanwhile, we introduce a gradient-based optimization method to get these two parameters for activations, which are jointly trained in an end-to-end manner. Experimental results on benchmark models and datasets demonstrate that the proposed AdaBin is able to achieve state-of-the-art performance. For instance, we obtain a 66.4% Top-1 accuracy on the ImageNet using ResNet-18 architecture, and a 69.4 mAP on PASCAL VOC using SSD300.

updated: Wed Aug 17 2022 05:43:33 GMT+0000 (UTC)

published: Wed Aug 17 2022 05:43:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト