Snapture -- A Novel Neural Architecture for Combined Static and Dynamic Hand Gesture Recognition

Hassan Ali; Doreen Jirak; Stefan Wermter

Snapture-静的および動的な手のジェスチャ認識を組み合わせた新しいニューラルアーキテクチャ

ロボットが人々の日常生活に深く関わることが期待される中、直感的なユーザーインターフェースを可能にするフレームワークが求められています。手のジェスチャ認識システムは、自然なコミュニケーション方法を提供するため、シームレスなヒューマンロボットインタラクション（HRI）の不可欠な部分です。近年、ディープラーニングを利用した計算モデルの大幅な進化が見られます。ただし、最先端のモデルでは、エンブレムや共同音声など、さまざまなジェスチャドメインにまたがって拡張することはできません。本論文では、新しいハイブリッド手ジェスチャ認識システムを提案した。私たちのアーキテクチャでは、静的ジェスチャと動的ジェスチャの両方を学習できます。ジェスチャパフォーマンスのいわゆる「スナップショット」をピーク時にキャプチャすることで、手のポーズと動的な動きを統合します。さらに、ジェスチャのモーションプロファイルを分析してその動的特性を明らかにし、モーションの量に基づいて静的チャネルを調整できるようにする方法を紹介します。私たちの評価は、CNNLSTMベースラインと比較した2つのジェスチャベンチマークに対する私たちのアプローチの優位性を示しています。また、ジェスチャクラスベースの分析を提供し、パフォーマンス向上のためのSnaptureアーキテクチャの可能性を明らかにします。そのモジュラー実装のおかげで、私たちのフレームワークは、HRIシナリオの重要な手がかりである顔の表情や頭の追跡などの他のマルチモーダルデータを1つのアーキテクチャに統合することを可能にします。したがって、私たちの仕事は、ロボットとの非言語コミュニケーションのためのジェスチャ認識研究と機械学習アプリケーションの両方に貢献しています。

As robots are expected to get more involved in people's everyday lives, frameworks that enable intuitive user interfaces are in demand. Hand gesture recognition systems provide a natural way of communication and, thus, are an integral part of seamless Human-Robot Interaction (HRI). Recent years have witnessed an immense evolution of computational models powered by deep learning. However, state-of-the-art models fall short in expanding across different gesture domains, such as emblems and co-speech. In this paper, we propose a novel hybrid hand gesture recognition system. Our architecture enables learning both static and dynamic gestures: by capturing a so-called "snapshot" of the gesture performance at its peak, we integrate the hand pose along with the dynamic movement. Moreover, we present a method for analyzing the motion profile of a gesture to uncover its dynamic characteristics and which allows regulating a static channel based on the amount of motion. Our evaluation demonstrates the superiority of our approach on two gesture benchmarks compared to a CNNLSTM baseline. We also provide an analysis on a gesture class basis that unveils the potential of our Snapture architecture for performance improvements. Thanks to its modular implementation, our framework allows the integration of other multimodal data like facial expressions and head tracking, which are important cues in HRI scenarios, into one architecture. Thus, our work contributes both to gesture recognition research and machine learning applications for non-verbal communication with robots.

updated: Tue Feb 27 2024 10:59:33 GMT+0000 (UTC)

published: Sat May 28 2022 11:12:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト