Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-devices

Md Mohaimenuzzaman; Christoph Bergmeir; Bernd Meyer

剪定とXNOR-Net：エッジデバイスでのオーディオ分類のためのディープラーニングの包括的な研究

ディープラーニングは、コンピュータービジョンやマシンリスニングなど、モノのインターネット（IoT）に関連する多くのアプリケーション分野で大きな成功を収めてきました。これらのテクノロジーは、IoTのディープラーニングの力を十分に活用するために、最終的には直接エッジに持ち込む必要があります。明らかな課題は、モデルが大幅に小型化されている場合にのみ、厳密にリソースに制約のあるエッジデバイスにディープラーニング手法を実装できることです。このタスクは、ネットワークプルーニング、量子化、XNOR-Netの最近の進歩など、さまざまなモデル圧縮技術に依存しています。この研究では、マイクロコントローラーのオーディオ分類に対するこれらの手法の適合性を調べます。エンドツーエンドの生のオーディオ分類のためのXNOR-Netのアプリケーションと、このアプローチを剪定および量子化の方法と比較する包括的な実証的研究を紹介します。 XNORを使用した生のオーディオ分類により、メモリ要件を32倍、計算要件を58倍削減しながら、少数のクラスに対して通常の完全精度ネットワークと同等のパフォーマンスが得られることを示します。ただし、クラスの数が大幅に増えると、パフォーマンスが低下し、剪定と量子化に基づく圧縮手法が、同じスペースの制約を満たすことができるが、約8倍の計算を必要とする推奨手法として引き継がれます。これらの洞察は、標準のベンチマークセットを使用して、生のオーディオ分類と画像分類の間で一貫していることを示しています。私たちの知る限り、これはXNORをエンドツーエンドのオーディオ分類に適用し、代替技術のコンテキストで評価する最初の研究です。すべてのコードはGitHubで公開されています。

Deep learning has celebrated resounding successes in many application areas of relevance to the Internet of Things (IoT), such as computer vision and machine listening. These technologies must ultimately be brought directly to the edge to fully harness the power of deep learning for the IoT. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization, and the recent advancement of XNOR-Net. This study examines the suitability of these techniques for audio classification on microcontrollers. We present an application of XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades, and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring approximately 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets. To the best of our knowledge, this is the first study to apply XNOR to end-to-end audio classification and evaluate it in the context of alternative techniques. All codes are publicly available on GitHub.

updated: Mon Jan 17 2022 05:30:27 GMT+0000 (UTC)

published: Fri Aug 13 2021 09:07:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト