A deep-learning--based multimodal depth-aware dynamic hand gesture recognition system

Hasan Mahmud; Mashrur Mahmud Morshed; Md. Kamrul Hasan

ディープラーニングベースのマルチモーダル深度認識動的ハンドジェスチャ認識システム

特定の意味を伝えることを目的として行われる、手の時空間的な動きや向きの変更は、手のジェスチャーと見なすことができます。ハンドジェスチャ認識システムへの入力は、深度画像、単眼RGB、スケルトンジョイントポイントなど、いくつかの形式にすることができます。生の深度画像は、手の関心領域（ROI）でコントラストが低いことがわかります。指の曲がり情報（指が手のひらに重なっているのか、別の指に重なっているのか）など、学習する重要な詳細は強調されていません。最近、ディープラーニングベースの動的ハンドジェスチャ認識では、研究者は認識精度を向上させるために、さまざまな入力モダリティ（RGBまたは深度画像と手の骨格関節点など）を融合することに取り組んでいます。この論文では、深度量子化された画像の特徴と手の骨格の関節点を使用した動的な手のジェスチャー（DHG）認識に焦点を当てます。特に、畳み込みニューラルネットワーク（CNN）およびリカレントニューラルネットワーク（RNN）ベースのマルチモーダルフュージョンネットワークで深度量子化された特徴を使用することの効果を調査します。私たちの方法は、SHREC-DHG-14データセットの既存の結果を改善することがわかりました。さらに、私たちの方法を使用して、入力画像の解像度を4倍以上下げることができ、それでも以前の方法で使用された解像度と同等またはそれ以上の精度を得ることができることを示します。

Any spatio-temporal movement or reorientation of the hand, done with the intention of conveying a specific meaning, can be considered as a hand gesture. Inputs to hand gesture recognition systems can be in several forms, such as depth images, monocular RGB, or skeleton joint points. We observe that raw depth images possess low contrasts in the hand regions of interest (ROI). They do not highlight important details to learn, such as finger bending information (whether a finger is overlapping the palm, or another finger). Recently, in deep-learning--based dynamic hand gesture recognition, researchers are tying to fuse different input modalities (e.g. RGB or depth images and hand skeleton joint points) to improve the recognition accuracy. In this paper, we focus on dynamic hand gesture (DHG) recognition using depth quantized image features and hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based multi-modal fusion networks. We find that our method improves existing results on the SHREC-DHG-14 dataset. Furthermore, using our method, we show that it is possible to reduce the resolution of the input images by more than four times and still obtain comparable or better accuracy to that of the resolutions used in previous methods.

updated: Tue Jul 06 2021 11:18:53 GMT+0000 (UTC)

published: Tue Jul 06 2021 11:18:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト