Word-level Persian Lipreading Dataset

Javad Peymanfard; Ali Lashini; Samin Heydarian; Hossein Zeinali; Nasser Mozayani

単語レベルのペルシャ語読唇データセット

深層学習の進歩により、読唇術は近年目覚ましい進歩を遂げています。それにもかかわらず、そのような進歩の前提条件は、適切なデータセットです。この論文は、約 1,800 人の話者からの 244,000 のビデオを含む、ペルシア語の単語レベルの読唇術のための新しい野生のデータセットを提供します。この分野で最先端の方法を評価し、単語レベルの読唇術に新しいアプローチを使用しました。この方法では、特徴抽出に AV-HuBERT モデルを使用し、データセットで大幅に優れたパフォーマンスを得ました。

Lip-reading has made impressive progress in recent years, driven by advances in deep learning. Nonetheless, the prerequisite such advances is a suitable dataset. This paper provides a new in-the-wild dataset for Persian word-level lipreading containing 244,000 videos from approximately 1,800 speakers. We evaluated the state-of-the-art method in this field and used a novel approach for word-level lip-reading. In this method, we used the AV-HuBERT model for feature extraction and obtained significantly better performance on our dataset.

updated: Sat Apr 08 2023 17:00:35 GMT+0000 (UTC)

published: Sat Apr 08 2023 17:00:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト