Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries

Matyáš Boháček; Marek Hrúz

すでに存在するものから学ぶ: オンライン辞書を使用した少数の手話認識

今日の手話認識モデルには、実験室のようなビデオの大規模なトレーニングコーパスが必要であり、そのコレクションには膨大な労働力と財源が必要です。その結果、一般に公開されているシステムはほんの一握りであり、人口の少ない手話言語のローカリゼーション機能が限られていることは言うまでもありません。したがって、さまざまな属性と手話の注釈付きデータを本質的に保持するオンラインのテキストからビデオへの辞書を利用し、数回のショットでモデルをトレーニングすることは、このテクノロジーの民主化への有望な道筋を示します。この作業では、UWB-SL-Wild 少数ショットデータセットを収集してオープンソース化します。これは、辞書からスクレイピングされたビデオで構成されるこの種のトレーニングリソースとしては初めてのものです。このデータセットは、利用可能なオンライン手話データの実際の分布と特徴を表しています。既存のデータセット WLASL100 および ASLLVD と直接重複するグロスを選択し、それらのクラスマッピングを共有して転移学習実験を可能にします。ポーズベースのアーキテクチャでベースラインの結果を提供することとは別に、数ショットのシナリオで手話認識モデルをトレーニングするための新しいアプローチを導入し、ASLLVD-Skeleton および ASLLVD-Skeleton-20 で最先端の結果をもたらします。それぞれ 30.97~% と 95.45~% のトップ 1 精度を持つデータセット。

Today's sign language recognition models require large training corpora of laboratory-like videos, whose collection involves an extensive workforce and financial resources. As a result, only a handful of such systems are publicly available, not to mention their limited localization capabilities for less-populated sign languages. Utilizing online text-to-video dictionaries, which inherently hold annotated data of various attributes and sign languages, and training models in a few-shot fashion hence poses a promising path for the democratization of this technology. In this work, we collect and open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos. This dataset represents the actual distribution and characteristics of available online sign language data. We select glosses that directly overlap with the already existing datasets WLASL100 and ASLLVD and share their class mappings to allow for transfer learning experiments. Apart from providing baseline results on a pose-based architecture, we introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results on ASLLVD-Skeleton and ASLLVD-Skeleton-20 datasets with top-1 accuracy of 30.97~% and 95.45~%, respectively.

updated: Tue Jan 10 2023 03:21:01 GMT+0000 (UTC)

published: Tue Jan 10 2023 03:21:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト