Prosody Based Co-analysis for Continuous Recognition of Coverbal Gestures

Sanshzar Kettebekov; Mohammed Yeasin; Rajeev Sharma

カバージェスチャの連続認識のための韻律ベースの共分析

音声認識とジェスチャ認識は広く研究されてきましたが、統合フレームワークでそれらを組み合わせる成功した試みはすべて、例えばキーワードとジェスチャの共起など、意味的に動機付けられました。このような定式化は、自然言語処理の複雑さを継承しています。この論文は、連続的なカバージェスチャの自動認識の精度を改善するために、ジェスチャと音声の明瞭化の現象を使用するベイズ定式化を提示します。音声信号からの韻律的特徴を視覚信号と分析して、ジェスチャの特定の運動学的フェーズとの顕著な音声セグメントの共発生の事前確率を学習しました。上記の共同分析は、視覚的に小さなジェスチャの検出と曖昧さの解消に役立ち、その結果、継続的なジェスチャ認識率が向上することがわかりました。提案されたアプローチの有効性は、気象チャンネル放送から収集された大規模なデータベースで実証されました。この定式化は、マルチモーダル統合のボトムアップフレームワークに新しい道を開きます。

Although speech and gesture recognition has been studied extensively, all the successful attempts of combining them in the unified framework were semantically motivated, e.g., keyword-gesture cooccurrence. Such formulations inherited the complexity of natural language processing. This paper presents a Bayesian formulation that uses a phenomenon of gesture and speech articulation for improving accuracy of automatic recognition of continuous coverbal gestures. The prosodic features from the speech signal were coanalyzed with the visual signal to learn the prior probability of co-occurrence of the prominent spoken segments with the particular kinematical phases of gestures. It was found that the above co-analysis helps in detecting and disambiguating visually small gestures, which subsequently improves the rate of continuous gesture recognition. The efficacy of the proposed approach was demonstrated on a large database collected from the weather channel broadcast. This formulation opens new avenues for bottom-up frameworks of multimodal integration.

updated: Tue Nov 05 2002 19:27:32 GMT+0000 (UTC)

published: Tue Nov 05 2002 19:27:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト