Fixed-point quantization and binarization are two reduction methods adopted to deploy Convolutional Neural Networks (CNN) on end-nodes powered by low-power micro-controller units (MCUs). While most of the existing works use them as stand-alone optimizations, this work aims at demonstrating there is margin for a joint cooperation that leads to inferential engines with lower latency and higher accuracy. Called CoopNet, the proposed heterogeneous model is conceived, implemented and tested on off-the-shelf MCUs with small on-chip memory and few computational resources. Experimental results conducted on three different CNNs using as test-bench the low-power RISC core of the Cortex-M family by ARM validate the CoopNet proposal by showing substantial improvements w.r.t. designs where quantization and binarization are applied separately.
updated: Sat Jan 25 2020 12:10:51 GMT+0000 (UTC)
published: Tue Nov 19 2019 21:47:23 GMT+0000 (UTC)