Minimal Feature Analysis for Isolated Digit Recognition for varying encoding rates in noisy environments

Muskan Garg; Naveen Aggarwal

ノイズの多い環境でのさまざまなエンコードレートに対する孤立した数字認識のための最小限の特徴分析

この研究は、音声認識の最近の発展に関するものです。この研究作業では、さまざまなビットレートとさまざまなノイズレベルの存在下での孤立した数字認識の分析が実行されました。この調査作業は、audacity と HTK ツールキットを使用して行われました。隠れマルコフモデル (HMM) は、この実験を実行するために使用された認識モデルです。使用される特徴抽出手法は、メル周波数ケプストラム係数 (MFCC)、線形予測符号化 (LPC)、知覚線形予測 (PLP)、メルスペクトル (MELSPEC)、フィルターバンク (FBANK) です。データのテストで考慮された 3 種類の異なるノイズレベルがありました。これらには、ランダムノイズ、ファンノイズ、およびリアルタイム環境でのランダムノイズが含まれます。これは、リアルタイムアプリケーションに使用できる最適な環境を分析するために行われました。さらに、サンプリングレートの異なる５種類の一般的なビットレートを考え、最適なビットレートを見つけた。

This research work is about recent development made in speech recognition. In this research work, analysis of isolated digit recognition in the presence of different bit rates and at different noise levels has been performed. This research work has been carried using audacity and HTK toolkit. Hidden Markov Model (HMM) is the recognition model which was used to perform this experiment. The feature extraction techniques used are Mel Frequency Cepstrum coefficient (MFCC), Linear Predictive Coding (LPC), perceptual linear predictive (PLP), mel spectrum (MELSPEC), filter bank (FBANK). There were three types of different noise levels which have been considered for testing of data. These include random noise, fan noise and random noise in real time environment. This was done to analyse the best environment which can used for real time applications. Further, five different types of commonly used bit rates at different sampling rates were considered to find out the most optimum bit rate.

updated: Sat Aug 27 2022 23:05:06 GMT+0000 (UTC)

published: Sat Aug 27 2022 23:05:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト