Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar; Mo Han; Mohammadreza Sharif; Sezen Yagmur Gunay; Mariusz P. Furmanek; Mathew Yarossi; Paolo Bonato; Cagdas Onal; Taskin Padir; Deniz Erdogmus; Gunar Schirner

義手制御における人間の握る意図推論のためのEMGとビジョンのマルチモーダル融合

下腕切断者の場合、ロボット義手は日常生活動作で細かい物体操作を実行する能力を取り戻す可能性を提供します。 EEGやEMGなどの生理学的信号に基づく現在の制御方法は、モーションアーチファクト、経時的な皮膚電極接合インピーダンスの変動、筋肉疲労、およびその他の要因により、推論結果が悪くなる傾向があります。視覚的な証拠は、それ自体のアーティファクトの影響も受けやすく、ほとんどの場合、オブジェクトのオクルージョン、照明の変化、ビュー角度に応じたオブジェクトの形状の変化などが原因です。生理学的および視覚センサー測定を使用したマルチモーダルエビデンス融合は、これらのモダリティの補完的な強みのために自然なアプローチです。この論文では、ニューラルネットワークモデルによって処理された前腕からの視線ビデオ、視線、およびEMGを使用して、インテント推論を把握するためのベイズ証拠融合フレームワークを提示します。手が物体に近づいてそれをつかむときの時間の関数として、個々の融合したパフォーマンスを分析します。この目的のために、ニューラルネットワークコンポーネントをトレーニングするための新しいデータ処理および拡張技術も開発しました。私たちの実験データ分析は、EMGと視覚的証拠が補完的な強みを示し、その結果、マルチモーダル証拠の融合がいつでも個々の証拠モダリティよりも優れている可能性があることを示しています。具体的には、結果は、平均して、融合が到達段階での瞬間的な次の把握タイプの分類精度を、EMGおよび視覚的証拠と比較して13.66％および14.8％改善し、全体的な融合精度が95.3％になることを示しています。

For lower arm amputees, robotic prosthetic hands offer the promise to regain the capability to perform fine object manipulation in activities of daily living. Current control methods based on physiological signals such as EEG and EMG are prone to poor inference outcomes due to motion artifacts, variability of skin electrode junction impedance over time, muscle fatigue, and other factors. Visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, variable shapes of objects depending on view-angle, among other factors. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time. Specifically, results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually, resulting in an overall fusion accuracy of 95.3%.

updated: Sat Apr 23 2022 14:52:27 GMT+0000 (UTC)

published: Thu Apr 08 2021 17:01:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト