FastHand: Fast Hand Pose Estimation From A Monocular Camera

Shan An; Xiajie Zhang; Dong Wei; Haogang Zhu; Jianyu Yang; Konstantinos A. Tsintotas

FastHand：単眼カメラからの高速ハンドポーズ推定

手のジェスチャー認識は、人間とロボットの相互作用に関連するほとんどの方法の最初のステップを構成します。このタスクには2つの重要な課題があります。 1つ目は、現実世界のシナリオで安定した正確な手のランドマーク予測を達成することの難しさに対応し、2つ目は前方推論の時間の短縮に対応します。この論文では、「FastHand」と呼ばれる、手のポーズ推定のための高速で正確なフレームワークを提案します。軽量のエンコーダ-デコーダネットワークアーキテクチャを使用して、組み込みデバイスで実行される実用的なアプリケーションの要件を満たすことを実現します。エンコーダーは少数のパラメーターを持つ深いレイヤーで構成され、デコーダーは空間位置情報を利用してより正確な結果を取得します。評価は、他の最先端のアプローチと比較して提案されたパイプラインのパフォーマンスが向上していることを示す、2つの公開されているデータセットで行われました。 FastHandは、NVIDIA Jetson TX2グラフィックスプロセッシングユニットで毎秒25フレームの速度に到達しながら、高精度のスコアを提供します。

Hand gesture recognition constitutes the initial step in most methods related to human-robot interaction. There are two key challenges in this task. The first one corresponds to the difficulty of achieving stable and accurate hand landmark predictions in real-world scenarios, while the second to the decreased time of forward inference. In this paper, we propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder network architecture, we achieve to fulfil the requirements of practical applications running on embedded devices. The encoder consists of deep layers with a small number of parameters, while the decoder makes use of spatial location information to obtain more accurate results. The evaluation took place on two publicly available datasets demonstrating the improved performance of the proposed pipeline compared to other state-of-the-art approaches. FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.

updated: Sun Feb 14 2021 04:12:41 GMT+0000 (UTC)

published: Sun Feb 14 2021 04:12:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト