SparseVSR: Lightweight and Noise Robust Visual Speech Recognition

Adriana Fernandez-Lopez; Honglie Chen; Pingchuan Ma; Alexandros Haliassos; Stavros Petridis; Maja Pantic

SparseVSR: 軽量でノイズに強い視覚音声認識

ディープニューラルネットワークの最近の進歩により、視覚音声認識において前例のない成功が達成されました。ただし、現在の方法と、リソースに制約のあるデバイスへの導入との間には、依然として大きな差異があります。この研究では、特に視覚的なノイズが存在する場合に、同等の高密度モデルよりも高いパフォーマンスを達成する軽量モデルを生成するために、さまざまな大きさに基づく枝刈り手法を検討します。当社のスパースモデルは、LRS3 データセットの 10% のスパース性で最先端の結果を達成し、最大 70% のスパース性までの密な同等モデルを上回ります。 7 つの異なる視覚ノイズタイプで 50% スパースモデルを評価し、密な等価モデルと比較して全体的に 2% 以上の WER の絶対的な改善を達成しました。私たちの結果は、疎なネットワークが密なネットワークよりもノイズに強いことを裏付けています。

Recent advances in deep neural networks have achieved unprecedented success in visual speech recognition. However, there remains substantial disparity between current methods and their deployment in resource-constrained devices. In this work, we explore different magnitude-based pruning techniques to generate a lightweight model that achieves higher performance than its dense model equivalent, especially under the presence of visual noise. Our sparse models achieve state-of-the-art results at 10% sparsity on the LRS3 dataset and outperform the dense equivalent up to 70% sparsity. We evaluate our 50% sparse model on 7 different visual noise types and achieve an overall absolute improvement of more than 2% WER compared to the dense equivalent. Our results confirm that sparse networks are more resistant to noise than dense networks.

updated: Mon Jul 10 2023 13:34:13 GMT+0000 (UTC)

published: Mon Jul 10 2023 13:34:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト