Evaluation Of Hidden Markov Models Using Deep CNN Features In Isolated Sign Recognition

Anil Osman Tur; Hacer Yalim Keles

分離標識認識でディープCNN機能を使用した隠れマルコフモデルの評価

ビデオストリームからの分離された標識認識は、ローカルおよびグローバルの両方の手の機能と顔のジェスチャーに同時に参加する必要がある標識のマルチモーダルな性質により、難しい問題です。この問題は最近、深い畳み込みニューラルネットワーク（CNN）ベースの機能と長い短期記憶（LSTM）ベースの深いシーケンスモデルを使用して広く研究されています。ただし、現在の文献では、深い機能を備えた隠れマルコフモデル（HMM）を使用した実証分析が提供されていません。この研究では、3つのモジュールで構成されるフレームワークを提供し、異なるシーケンスモデルを使用して、孤立した標識認識問題を解決します。深いフィーチャの寸法は通常、HMMモデルで作業するには大きすぎます。この問題を解決するために、フレームワークの2番目のモジュールとして2つの代替CNNベースのアーキテクチャを提案し、深い機能の次元を効果的に削減します。広範な実験の後、事前トレーニング済みのResnet50機能とCNNベースの次元削減モデルの1つを使用して、HMMはRGBと骨格データを使用してモンタルバーノデータセットで90.15％の精度で孤立した標識を分類できることを示します。この性能は、現在のLSTMベースのモデルと同等です。 HMMはパラメーターが少なく、GPUを必要とせずに、トレーニングして、市販のコンピューターで高速に実行できます。したがって、深い機能を備えた分析では、HMMは、孤立した標識の認識問題に挑戦する際に深いシーケンスモデルと同様に利用できることを示しています。

Isolated sign recognition from video streams is a challenging problem due to the multi-modal nature of the signs, where both local and global hand features and face gestures needs to be attended simultaneously. This problem has recently been studied widely using deep Convolutional Neural Network (CNN) based features and Long Short-Term Memory (LSTM) based deep sequence models. However, the current literature is lack of providing empirical analysis using Hidden Markov Models (HMMs) with deep features. In this study, we provide a framework that is composed of three modules to solve isolated sign recognition problem using different sequence models. The dimensions of deep features are usually too large to work with HMM models. To solve this problem, we propose two alternative CNN based architectures as the second module in our framework, to reduce deep feature dimensions effectively. After extensive experiments, we show that using pretrained Resnet50 features and one of our CNN based dimension reduction models, HMMs can classify isolated signs with 90.15% accuracy in Montalbano dataset using RGB and Skeletal data. This performance is comparable with the current LSTM based models. HMMs have fewer parameters and can be trained and run on commodity computers fast, without requiring GPUs. Therefore, our analysis with deep features show that HMMs could also be utilized as well as deep sequence models in challenging isolated sign recognition problem.

updated: Mon May 10 2021 13:24:33 GMT+0000 (UTC)

published: Fri Jun 19 2020 15:18:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト