Fast Learning of Dynamic Hand Gesture Recognition with Few-Shot Learning Models

Niels Schlüsener; Michael Bücker

少数ショット学習モデルによる動的ハンドジェスチャ認識の高速学習

ハンドジェスチャーごとに 1 つ、2 つ、または 5 つの例をモデルに提供することで、それぞれ 5 つまたは 10 の異なる動的ハンドジェスチャーを認識するようにトレーニングされた Few-Shot Learning モデルを開発します。すべてのモデルは、長短期記憶セルがバックボーンを形成する関係ネットワーク (RN) の少数ショット学習アーキテクチャで構築されました。モデルは、190 種類の手のジェスチャーを含むように変更された Jester データセットの RGB ビデオシーケンスから抽出された手の基準点を使用します。結果は、5 つの認識で最大 88.8%、10 個の動的ハンドジェスチャで最大 81.2% の精度を示しています。この研究はまた、動的な手のジェスチャーを検出するために従来の深層学習アプローチの代わりに、Few-Shot Learning アプローチを使用することで潜在的に労力を節約できることにも光を当てています。節約は、ディープラーニングモデルが Few Shot Learning モデルの代わりに新しいハンドジェスチャでトレーニングされるときに必要な追加の観測数として定義されました。ほぼ同じ精度を達成するために必要な観測の総数に関する違いは、認識される 5 つのハンドジェスチャで最大 630 の観測、10 のハンドジェスチャで最大 1260 の観測を節約できる可能性があることを示しています。ハンドジェスチャのビデオ録画にラベルを付けるにはかなりの労力が必要になるため、これらの節約は相当なものと見なすことができます。

We develop Few-Shot Learning models trained to recognize five or ten different dynamic hand gestures, respectively, which are arbitrarily interchangeable by providing the model with one, two, or five examples per hand gesture. All models were built in the Few-Shot Learning architecture of the Relation Network (RN), in which Long-Short-Term Memory cells form the backbone. The models use hand reference points extracted from RGB-video sequences of the Jester dataset which was modified to contain 190 different types of hand gestures. Result show accuracy of up to 88.8% for recognition of five and up to 81.2% for ten dynamic hand gestures. The research also sheds light on the potential effort savings of using a Few-Shot Learning approach instead of a traditional Deep Learning approach to detect dynamic hand gestures. Savings were defined as the number of additional observations required when a Deep Learning model is trained on new hand gestures instead of a Few Shot Learning model. The difference with respect to the total number of observations required to achieve approximately the same accuracy indicates potential savings of up to 630 observations for five and up to 1260 observations for ten hand gestures to be recognized. Since labeling video recordings of hand gestures implies significant effort, these savings can be considered substantial.

updated: Fri Dec 16 2022 09:31:15 GMT+0000 (UTC)

published: Fri Dec 16 2022 09:31:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト