Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

Praneeth Nemani; G. Sai Krishna; Supriya Kundrapu

話者に依存しない自動視覚音声認識: 包括的な調査

話者に依存しない VSR は、話者の顔の動きのビデオ記録から話された単語やフレーズを識別することを含む複雑なタスクです。長年にわたり、VSR の分野では、システムのパフォーマンスを評価するためのさまざまなアルゴリズムやデータセットを含むかなりの量の研究が行われてきました。これらの取り組みにより、効果的な VSR モデルの開発が大幅に進歩し、この分野でさらなる研究を行う新たな機会が生まれました。この調査では、話者依存型システムから話者非依存型システムへの移行に特に重点を置き、過去 30 年間にわたる VSR の進歩を詳細に調査しています。また、VSR 研究で使用されるさまざまなデータセットと話者の独立性を達成するために使用される前処理技術の包括的な概要も提供します。この調査は1990年から2023年までに出版された作品を対象としており、各作品を徹底的に分析し、さまざまなパラメータで比較しています。この調査は、1990 年から 2023 年までの話者に依存しない VSR システムの進化の詳細な分析を提供します。これは、長期にわたる VSR システムの開発を概説し、話者に依存しない VSR のためのエンドツーエンドのパイプラインを開発する必要性を強調しています。図による表現は、話者に依存しない VSR で使用される技術の明確かつ簡潔な概要を提供するため、さまざまな方法論の理解と分析に役立ちます。この調査では、各技術の長所と限界も強調され、視覚的な音声キューを分析するための新しいアプローチの開発に関する洞察が得られます。全体として、この包括的なレビューは、現在の最先端のスピーカーに依存しない VSR についての洞察を提供し、将来の研究の可能性のある領域に焦点を当てています。

Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR involving different algorithms and datasets to evaluate system performance. These efforts have resulted in significant progress in developing effective VSR models, creating new opportunities for further research in this area. This survey provides a detailed examination of the progression of VSR over the past three decades, with a particular emphasis on the transition from speaker-dependent to speaker-independent systems. We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence. The survey covers the works published from 1990 to 2023, thoroughly analyzing each work and comparing them on various parameters. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023. It outlines the development of VSR systems over time and highlights the need to develop end-to-end pipelines for speaker-independent VSR. The pictorial representation offers a clear and concise overview of the techniques used in speaker-independent VSR, thereby aiding in the comprehension and analysis of the various methodologies. The survey also highlights the strengths and limitations of each technique and provides insights into developing novel approaches for analyzing visual speech cues. Overall, This comprehensive review provides insights into the current state-of-the-art speaker-independent VSR and highlights potential areas for future research.

updated: Wed Jun 14 2023 07:33:43 GMT+0000 (UTC)

published: Wed Jun 14 2023 07:33:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト