Towards Predicting Fine Finger Motions from Ultrasound Images via Kinematic Representation

Dean Zadok; Oren Salzman; Alon Wolf; Alex M. Bronstein

運動学的表現による超音波画像からの細かい指の動きの予測に向けて

ロボット義足を構築する際の中心的な課題は、下肢からの生理学的信号を読み取り、ロボットの手にさまざまなタスクを実行するように指示できるセンサーベースのシステムを作成することです。既存のシステムは通常、筋電図（EMG）または超音波（US）テクノロジーを使用して筋肉の状態を分析することにより、ポインティングや把持などの個別のジェスチャを実行します。この作業では、キーボードの入力やピアノの演奏などの器用なタスクを実行するときに、一連の米国の画像から特定の指のアクティブ化を識別する推論問題を研究します。指のジェスチャの推定は、過去に目立つジェスチャを検出することによって行われてきましたが、時間の経過とともに進化する微妙な動きのコンテキストで行われる分類に関心があります。このタスクは、日常のタスクを実行する際の機能を劇的に向上させる可能性があるため、腕の切断者の間でロボット義足の採用率を高めるための重要なステップであると考えています。この作業の動機となる私たちの重要な観察は、手をロボットマニピュレーターとしてモデル化することで、米国の画像が上記の構成にマッピングされる中間表現をエンコードできることです。時間的コヒーレンスを利用するニューラルネットワークアーキテクチャと相まって、そのような学習された構成のシーケンスを考えると、細かい指の動きを推測することができます。被験者のグループからデータを収集し、フレームワークを使用して再生またはテキスト入力された音楽を再生する方法を示すことにより、この方法を評価しました。私たちの知る限り、これはエンドツーエンドシステム内のこれらのダウンストリームタスクを実証する最初の研究です。

A central challenge in building robotic prostheses is the creation of a sensor-based system able to read physiological signals from the lower limb and instruct a robotic hand to perform various tasks. Existing systems typically perform discrete gestures such as pointing or grasping, by employing electromyography (EMG) or ultrasound (US) technologies to analyze the state of the muscles. In this work, we study the inference problem of identifying the activation of specific fingers from a sequence of US images when performing dexterous tasks such as keyboard typing or playing the piano. While estimating finger gestures has been done in the past by detecting prominent gestures, we are interested in classification done in the context of fine motions that evolve over time. We consider this task as an important step towards higher adoption rates of robotic prostheses among arm amputees, as it has the potential to dramatically increase functionality in performing daily tasks. Our key observation, motivating this work, is that modeling the hand as a robotic manipulator allows to encode an intermediate representation wherein US images are mapped to said configurations. Given a sequence of such learned configurations, coupled with a neural-network architecture that exploits temporal coherence, we are able to infer fine finger motions. We evaluated our method by collecting data from a group of subjects and demonstrating how our framework can be used to replay music played or text typed. To the best of our knowledge, this is the first study demonstrating these downstream tasks within an end-to-end system.

updated: Thu Feb 10 2022 18:05:09 GMT+0000 (UTC)

published: Thu Feb 10 2022 18:05:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト