Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar; Mo Han; Mohammadreza Sharif; Sezen Yagmur Gunay; Mariusz P. Furmanek; Mathew Yarossi; Paolo Bonato; Cagdas Onal; Taskin Padir; Deniz Erdogmus; Gunar Schirner

義手制御における人間の把握意図推論のためのEMGとビジョンのマルチモーダル融合

下腕切断者の場合、ロボット義手は日常生活動作で細かい物体操作を実行する能力を取り戻す可能性を提供します。 EEGやEMGなどの生理学的信号に基づく現在の制御方法は、モーションアーチファクト、経時的な皮膚電極接合インピーダンスの変動、筋肉疲労、およびその他の要因により、推論結果が悪くなる傾向があります。視覚的な証拠は、それ自体のアーティファクトの影響も受けやすく、ほとんどの場合、オブジェクトのオクルージョン、照明の変化、ビュー角度に応じたオブジェクトのさまざまな形状などが原因です。生理学的および視覚センサー測定を使用したマルチモーダルエビデンス融合は、これらのモダリティの補完的な強みのために自然なアプローチです。この論文では、ニューラルネットワークモデルによって処理された前腕からの視線ビデオ、視線、およびEMGを使用して、意図を把握するためのベイズ証拠融合フレームワークを提示します。手が物体に近づいてそれをつかむときの時間の関数として、個々の融合したパフォーマンスを分析します。この目的のために、ニューラルネットワークコンポーネントをトレーニングするための新しいデータ処理および拡張技術も開発しました。私たちの実験データ分析は、EMGと視覚的証拠が補完的な強みを示し、その結果、マルチモーダル証拠の融合がいつでも個々の証拠モダリティを上回る可能性があることを示しています。具体的には、結果は、平均して、融合は、EMGおよび視覚的証拠に個別に比べて、到達段階にある間、瞬間的な次の把握タイプの分類精度を13.66％および14.8％改善することを示しています。 13のラベル間で95.3％の全体的な融合精度（7.7％のチャンスレベルと比較して）が達成され、より詳細な分析は、正しい把握が十分に早く、トップの候補と比較して高い信頼性で推測されることを示しています。ループを閉じるためのロボットの作動に成功しました。

For lower arm amputees, robotic prosthetic hands offer the promise to regain the capability to perform fine object manipulation in activities of daily living. Current control methods based on physiological signals such as EEG and EMG are prone to poor inference outcomes due to motion artifacts, variability of skin electrode junction impedance over time, muscle fatigue, and other factors. Visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, variable shapes of objects depending on view-angle, among other factors. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time. Specifically, results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG and visual evidence individually. An overall fusion accuracy of 95.3% among 13 labels (compared to a chance level of 7.7%) is achieved, and more detailed analysis indicate that the correct grasp is inferred sufficiently early and with high confidence compared to the top contender, in order to allow successful robot actuation to close the loop.

updated: Thu Apr 08 2021 17:01:19 GMT+0000 (UTC)

published: Thu Apr 08 2021 17:01:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト