Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment

Xueying Shi; Yueming Jin; Qi Dou; Jing Qin; Pheng-Ann Heng

教師なし運動学的-視覚データアライメントによるドメイン適応型ロボットジェスチャ認識

自動化された外科的ジェスチャ認識は、ロボット支援の低侵襲手術において非常に重要です。ただし、既存の方法では、トレーニングデータとテストデータが同じドメインからのものであると想定されているため、シミュレータや実際のロボットなど、ドメインのギャップが存在するとパフォーマンスが大幅に低下します。この論文では、マルチモダリティの知識、つまり運動学的データと視覚的データの両方をシミュレータから実際のロボットに同時に転送できる、新しい教師なしドメイン適応フレームワークを提案します。ビデオの一時的な手がかりと、ジェスチャーの認識に向けたマルチモーダルの固有の相関関係を使用することにより、強化された転送可能な機能でドメインギャップを修正します。具体的には、まずMDO-Kを提案して運動学を調整します。これは、時間的連続性を利用して、位置値ではなく小さなギャップで運動方向を転送し、適応の負担を軽減します。さらに、運動学と視覚の共起信号を転送するためのKV-Relation-ATTを提案します。相関の類似性を伴うこのような機能は、モデルのドメイン不変性を強化するためにより有益です。 2つの機能調整戦略は、エンドツーエンドの学習プロセス中にモデルに相互に利益をもたらします。ペグ転送手順を備えたDESKデータセットを使用して、ジェスチャ認識の方法を広範囲に評価します。結果は、私たちのアプローチが、実際のロボットで注釈を使用せずに、ACCで最大12.91％、F1scoreで最大20.16％の大幅な改善により、パフォーマンスを回復することを示しています。

Automated surgical gesture recognition is of great importance in robot-assisted minimally invasive surgery. However, existing methods assume that training and testing data are from the same domain, which suffers from severe performance degradation when a domain gap exists, such as the simulator and real robot. In this paper, we propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Specifically, we first propose an MDO-K to align kinematics, which exploits temporal continuity to transfer motion directions with smaller gap rather than position values, relieving the adaptation burden. Moreover, we propose a KV-Relation-ATT to transfer the co-occurrence signals of kinematics and vision. Such features attended by correlation similarity are more informative for enhancing domain-invariance of the model. Two feature alignment strategies benefit the model mutually during the end-to-end learning process. We extensively evaluate our method for gesture recognition using DESK dataset with peg transfer procedure. Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.

updated: Sat Jul 17 2021 06:57:12 GMT+0000 (UTC)

published: Sat Mar 06 2021 09:10:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト