YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel Corpus

David Uthus; Garrett Tanzer; Manfred Georg

YouTube-ASL: 大規模なオープンドメインのアメリカ手話と英語の対訳コーパス

手話の機械学習はデータがボトルネックとなっています。この論文では、アメリカ手話 (ASL) ビデオとそれに付随する YouTube からの英語キャプションの大規模なオープンドメインコーパスである YouTube-ASL を紹介します。 YouTube-ASL は、約 1,000 時間のビデオと 2,500 人を超える一意の署名者を備えており、以前の最大の ASL データセットと比較して、規模が約 3 倍、一意の署名者が約 10 倍になっています。 YouTube-ASL で ASL から英語への翻訳のベースラインモデルをトレーニングし、How2Sign で評価します。そこで、新しく微調整された 12.39 BLEU の最先端技術を達成し、初めてゼロショットの結果を報告します。

Machine learning for sign languages is bottlenecked by data. In this paper, we present YouTube-ASL, a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset. We train baseline models for ASL to English translation on YouTube-ASL and evaluate them on How2Sign, where we achieve a new finetuned state of the art of 12.39 BLEU and, for the first time, report zero-shot results.

updated: Tue Jun 27 2023 02:44:07 GMT+0000 (UTC)

published: Tue Jun 27 2023 02:44:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト