Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar; Mo Han; Mohammadreza Sharif; Sezen Yagmur Gunay; Mariusz P. Furmanek; Mathew Yarossi; Paolo Bonato; Cagdas Onal; Taskin Padir; Deniz Erdogmus; Gunar Schirner

義手制御における人間の握る意図推論のためのEMGとビジョンのマルチモーダル融合

目的：下腕切断者の場合、ロボット義手は日常生活活動を実行する能力を取り戻すことを約束します。筋電図検査（EMG）などの生理学的信号に基づく現在の制御方法では、モーションアーチファクト、筋肉の倦怠感などが原因で、推論の結果が低下する傾向があります。視覚センサーは、環境の状態に関する主要な情報源であり、実行可能で意図されたジェスチャを推測する上で重要な役割を果たすことができます。ただし、視覚的証拠は、ほとんどの場合、オブジェクトの閉塞、照明の変化などが原因で、それ自体のアーティファクトの影響を受けやすくなります。生理学的および視覚センサー測定を使用したマルチモーダル証拠融合は、これらのモダリティの補完的な強さのために自然なアプローチです。方法：この論文では、ニューラルネットワークモデルによって処理された前腕からの視線ビデオ、視線、およびEMGを使用して、インテント推論を把握するためのベイズ証拠融合フレームワークを提示します。手が物体に近づいてそれをつかむときの時間の関数として、個々の融合したパフォーマンスを分析します。この目的のために、ニューラルネットワークコンポーネントをトレーニングするための新しいデータ処理および拡張技術も開発しました。結果：私たちの結果は、平均して、融合は、到達段階での瞬間的な次の把握タイプの分類精度を、EMGおよび視覚的証拠と比較して13.66％および14.8％改善し、全体的な融合精度は95.3％であることを示しています。結論：私たちの実験データ分析は、EMGと視覚的エビデンスが補完的な強みを示し、その結果、マルチモーダルエビデンスの融合がいつでも個々のエビデンスモダリティよりも優れていることを示しています。

Objective: For lower arm amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

updated: Thu Oct 05 2023 21:26:48 GMT+0000 (UTC)

published: Thu Apr 08 2021 17:01:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト