Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-devices

Md Mohaimenuzzaman; Christoph Bergmeir; Bernd Meyer

プルーニングとXNOR-Net：エッジデバイスでのオーディオ分類のためのディープラーニングの包括的な研究

ディープラーニングは、コンピュータービジョンやマシンリスニングなど、モノのインターネットに関連する多くのアプリケーション分野で大きな成功を収めてきました。 IoTのディープリーニングの力を十分に活用するには、これらのテクノロジーを最終的に直接エッジに導入する必要があります。明らかな課題は、モデルが大幅に縮小された場合にのみ、厳密にリソースに制約のあるエッジデバイスにディープラーニング手法を実装できることです。このタスクは、ネットワークプルーニング、量子化、XNOR-Netの最近の進歩など、さまざまなモデル圧縮技術に依存しています。このホワイトペーパーでは、マイクロコントローラのオーディオ分類に対するこれらの手法の適合性を検証します。エンドツーエンドの生のオーディオ分類のためのXNOR-Netと、このアプローチを剪定および量子化の方法と比較する包括的な実証的研究を紹介します。 XNORを使用した生のオーディオ分類により、メモリ要件を32倍、計算要件を58倍削減しながら、少数のクラスで通常の完全精度ネットワークと同等のパフォーマンスが得られることを示します。ただし、クラスの数が大幅に増えると、パフォーマンスが低下し、プルーニングと量子化に基づく圧縮手法が、同じスペースの制約を満たすことができるが、約8倍の計算を必要とする推奨手法として引き継がれます。これらの洞察は、標準のベンチマークセットを使用して、生のオーディオ分類と画像分類の間で一貫していることを示しています。私たちの知る限り、これはXNORをエンドツーエンドのオーディオ分類に適用し、代替手法のコンテキストで評価する最初の研究です。すべてのコードはGitHubで公開されています。

Deep Learning has celebrated resounding successes in many application areas of relevance to the Internet-of-Things, for example, computer vision and machine listening. To fully harness the power of deep leaning for the IoT, these technologies must ultimately be brought directly to the edge. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization, and the recent advancement of XNOR-Net. This paper examines the suitability of these techniques for audio classification on microcontrollers. We present an XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades, and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring about 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets. To the best of our knowledge, this is the first study applying XNOR to end-to-end audio classification and evaluating it in the context of alternative techniques. All code is publicly available on GitHub.

updated: Sun Aug 29 2021 11:10:52 GMT+0000 (UTC)

published: Fri Aug 13 2021 09:07:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト