Smart Ternary Quantization

Grégoire Morin; Ryan Razani; Vahid Partovi Nia; Eyyüb Sari

スマート三元量子化

ニューラルネットワークモデルはリソースを大量に消費します。このリソース要件を軽減するための一般的なアプローチは、2進および3進量子化などの低ビット量子化です。 3進量子化は、より柔軟なモデルを提供し、精度の点で2進量子化よりも優れていますが、メモリが2倍になり、計算コストが増加します。一方、量子化深度の混合モデルでは、精度とメモリフットプリントのトレードオフが可能です。このようなモデルでは、量子化の深さは手動で選択されることが多く（これは骨の折れる作業です）、別の最適化ルーチンを使用して調整されます（量子化ネットワークを複数回トレーニングする必要があります）。ここでは、モデルを1回だけトレーニングするように、適応正則化関数を介して量子化深度を直接変更するSmart Ternary Quantization（STQ）を提案します。この方法は、トレーニング中に2進数と3進数の量子化を切り替えます。その用途を画像分類に示します。

Neural network models are resource hungry. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and often beats binary quantization in terms of accuracy, but doubles memory and increases computation cost. Mixed quantization depth models, on another hand, allows a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually (which is a tiring task), or is tuned using a separate optimization routine (which requires training a quantized network multiple times). Here, we propose Smart Ternary Quantization (STQ) in which we modify the quantization depth directly through an adaptive regularization function, so that we train a model only once. This method jumps between binary and ternary quantization while training. We show its application on image classification.

updated: Thu Sep 26 2019 15:49:08 GMT+0000 (UTC)

published: Thu Sep 26 2019 15:49:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト