BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

Samuel Albanie; Gül Varol; Liliane Momeni; Triantafyllos Afouras; Joon Son Chung; Neil Fox; Andrew Zisserman

BSL-1K：口述の手がかりを使用して、多関節手話認識をスケールアップする

きめの細かいジェスチャーとアクションの分類、および機械翻訳の最近の進歩は、自動化された手話認識が現実になる可能性を指摘しています。この目標に向けて前進する上での主な障害は、適切なトレーニングデータが不足していることです。これは、標識注釈の複雑性が高く、資格のある注釈者が限られているためです。この作業では、連続ビデオでの標識認識のためのデータ収集への新しいスケーラブルなアプローチを紹介します。放送映像用の弱く配置された字幕とキーワードスポッティングメソッドを使用して、1,000時間のビデオで1,000標識の語彙の標識インスタンスを自動的にローカライズします。私たちは以下の貢献をします：（1）署名者からの口伝えの手掛かりを使用してビデオデータから高品質の注釈を取得する方法を示します-結果は、前例のないスケールのイギリス手話（BSL）標識のコレクションであるBSL-1Kデータセットです; （2）BSL-1Kを使用して、BSLで連結された標識の強力な標識認識モデルをトレーニングできること、およびこれらのモデルが他の手話とベンチマークの優れた事前トレーニングを形成できることを示します。 MSASLおよびWLASLベンチマーク。最後に、（3）標識認識および標識スポッティングのタスクに新しい大規模評価セットを提案し、この領域の研究を刺激するのに役立つと期待されるベースラインを提供します。

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language (BSL) signs of unprecedented scale; (2) We show that we can use BSL-1K to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks - we exceed the state of the art on both the MSASL and WLASL benchmarks. Finally, (3) we propose new large-scale evaluation sets for the tasks of sign recognition and sign spotting and provide baselines which we hope will serve to stimulate research in this area.

updated: Wed Oct 13 2021 17:13:42 GMT+0000 (UTC)

published: Thu Jul 23 2020 16:59:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト