Egyptian Sign Language Recognition Using CNN and LSTM

Ahmed Elhagry; Rawan Gla

CNNとLSTMを使用したエジプト手話認識

手話は、聴覚障害者がコミュニケーションをとるために使用する一連のジェスチャーです。残念ながら、普通の人はそれを理解していないため、コミュニケーションのギャップを埋める必要があります。ある地域から別の地域への（エジプト手話）ESLの違いのために、ESLは挑戦的な研究問題を提供します。この作業では、エジプトの聴覚障害者の地域コミュニティに適度かつ妥当な精度でサービスを提供する、ビデオベースのエジプト手話認識システムを使用して応用研究を提供しています。 2つの異なるニューラルネットワークアーキテクチャを備えたコンピュータビジョンシステムを紹介します。 1つ目は、空間的特徴を抽出するための畳み込みニューラルネットワーク（CNN）です。 CNNモデルは、開始modで再トレーニングされました。 2番目のアーキテクチャは、CNNと、それに続く空間的特徴と時間的特徴の両方を抽出するための長短期記憶（LSTM）です。 2つのモデルはそれぞれ90％と72％の精度を達成しました。エジプトの一部のろう者コミュニティの間で、9つの一般的な単語（同様の記号）を区別するために、これら2つのアーキテクチャの力を調べました。

Sign language is a set of gestures that deaf people use to communicate. Unfortunately, normal people don't understand it, which creates a communication gap that needs to be filled. Because of the variations in (Egyptian Sign Language) ESL from one region to another, ESL provides a challenging research problem. In this work, we are providing applied research with its video-based Egyptian sign language recognition system that serves the local community of deaf people in Egypt, with a moderate and reasonable accuracy. We present a computer vision system with two different neural networks architectures. The first is a Convolutional Neural Network (CNN) for extracting spatial features. The CNN model was retrained on the inception mod. The second architecture is a CNN followed by a Long Short-Term Memory (LSTM) for extracting both spatial and temporal features. The two models achieved an accuracy of 90% and 72%, respectively. We examined the power of these two architectures to distinguish between 9 common words (with similar signs) among some deaf people community in Egypt.

updated: Wed Jul 28 2021 21:33:35 GMT+0000 (UTC)

published: Wed Jul 28 2021 21:33:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト