Understanding weight-magnitude hyperparameters in training binary networks

Joris Quist; Yunqiang Li; Jan van Gemert

バイナリネットワークのトレーニングにおける重みの大きさのハイパーパラメーターを理解する

バイナリニューラルネットワーク (BNN) は、実数値の重みの代わりにバイナリの重みを使用することにより、コンパクトで効率的です。現在の BNN は、トレーニング中に潜在的な実数値の重みを使用します。ここでは、いくつかのトレーニングハイパーパラメーターが実数値ネットワークから継承されます。これらのハイパーパラメータのいくつかの解釈は、実数値の重みの大きさに基づいています。ただし、BNN の場合、バイナリの重みの大きさは意味がないため、これらのハイパーパラメーターが実際に何をするかは不明です。 1 つの例は、実数値の重みの大きさを小さく保つことを目的とした重み減衰です。他の例として、潜在的な重みの初期化、学習率、および学習率の減衰があり、これらは実数値の重みの大きさに影響を与えます。大きさは、実数値の重みでは解釈できますが、2 値の重みではその意味が失われます。この論文では、ネットワーク最適化中の高次勾配フィルタリングに基づいて、これらの大きさに基づくハイパーパラメータの新しい解釈を提供します。私たちの分析により、マグニチュードベースのハイパーパラメーターがバイナリネットワークのトレーニングにどのように影響するかを理解することができます。これにより、実数値の解釈に依存しないバイナリニューラルネットワーク用に特別に設計された新しい最適化フィルターが可能になります。さらに、私たちの理解の向上により、ハイパーパラメーターの数が減り、ハイパーパラメーターの調整作業が容易になり、精度が向上するためのハイパーパラメーター値が向上する可能性があります。コードは https://github.com/jorisquist/Understanding-WM-HP-in-BNNs で入手できます。

Binary Neural Networks (BNNs) are compact and efficient by using binary weights instead of real-valued weights. Current BNNs use latent real-valued weights during training, where several training hyper-parameters are inherited from real-valued networks. The interpretation of several of these hyperparameters is based on the magnitude of the real-valued weights. For BNNs, however, the magnitude of binary weights is not meaningful, and thus it is unclear what these hyperparameters actually do. One example is weight-decay, which aims to keep the magnitude of real-valued weights small. Other examples are latent weight initialization, the learning rate, and learning rate decay, which influence the magnitude of the real-valued weights. The magnitude is interpretable for real-valued weights, but loses its meaning for binary weights. In this paper we offer a new interpretation of these magnitude-based hyperparameters based on higher-order gradient filtering during network optimization. Our analysis makes it possible to understand how magnitude-based hyperparameters influence the training of binary networks which allows for new optimization filters specifically designed for binary neural networks that are independent of their real-valued interpretation. Moreover, our improved understanding reduces the number of hyperparameters, which in turn eases the hyperparameter tuning effort which may lead to better hyperparameter values for improved accuracy. Code is available at https://github.com/jorisquist/Understanding-WM-HP-in-BNNs

updated: Sat Mar 04 2023 16:42:04 GMT+0000 (UTC)

published: Sat Mar 04 2023 16:42:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト