MBQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

Yunshan Zhong; Yuyao Zhou; Fei Chao; Rongrong Ji

Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. The computation of a desired bit-width is completed by selecting an appropriate number of branches that satisfy the original computational constraint. By fixing the weight bit-width, this approach substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we introduce an amortization branch selection strategy to distribute quantization errors caused by switching activation bit-widths among branches to improve performance. Finally, we adopt an in-place distillation strategy that facilitates guidance between branches to further enhance MBQuant's performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is at https://github.com/zysxmu/MultiQuant.

updated: Sun Jun 02 2024 08:30:21 GMT+0000 (UTC)

published: Sun May 14 2023 10:17:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト