End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network

Avi Gazneli; Gadi Zimerman; Tal Ridnik; Gilad Sharir; Asaf Noy

エンドツーエンドのオーディオストライクバック：効率的なオーディオ分類ネットワークに向けた拡張の強化

効率的なアーキテクチャとエンドツーエンドの画像分類タスクのための多数の拡張が提案され、徹底的に調査されていますが、オーディオ分類の最先端技術は、依然として、オーディオ信号の多数の表現と大規模なアーキテクチャに依存しています。 -大規模なデータセットから調整。継承された軽量のオーディオと斬新なオーディオ拡張を利用することで、強力な一般化機能を備えた効率的なエンドツーエンドネットワークを提供することができました。さまざまなサウンド分類セットでの実験は、さまざまな設定で最先端の結果を達成することにより、私たちのアプローチの有効性と堅牢性を示しています。公開コードは、https：//github.com/Alibaba-MIIL/AudioClassficationthishttpurlで入手できます。

While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code is available at: https://github.com/Alibaba-MIIL/AudioClassficationthis http url

updated: Tue Jul 05 2022 06:30:51 GMT+0000 (UTC)

published: Mon Apr 25 2022 07:50:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト