WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language

Federico Tavella; Viktor Schlegel; Marta Romeo; Aphrodite Galata; Angelo Cangelosi

WLASL-LEX：アメリカ手話で音韻特性を認識するためのデータセット

手話処理（SLP）は、聴覚障害者や聴覚障害者のコミュニケーションの主な手段である手話の自動処理に関するものです。 SLPは、標識認識から署名音声の翻訳および生成に至るまで、さまざまなタスクを備えていますが、これまでNLPコミュニティでは見過ごされてきました。この論文では、手話の音韻論をモデル化するタスクに注目します。既存のリソースを活用して、6つの異なる音韻特性で注釈が付けられたアメリカ手話標識の大規模なデータセットを構築します。次に、データ駆動型のエンドツーエンドおよび機能ベースのアプローチを最適化してこれらのプロパティを自動的に認識することができるかどうかを調査するために、広範な実証的調査を実施します。タスクに固有の課題にもかかわらず、生のビデオから抽出されたスケルトンの特徴を操作するグラフベースのニューラルネットワークは、さまざまな程度でタスクを成功させることができることがわかりました。最も重要なことは、このパフォーマンスがトレーニング中に観察されなかった兆候にも関係することを示しています。

Signed Language Processing (SLP) concerns the automated processing of signed languages, the main means of communication of Deaf and hearing impaired individuals. SLP features many different tasks, ranging from sign recognition to translation and production of signed speech, but has been overlooked by the NLP community thus far. In this paper, we bring to attention the task of modelling the phonology of sign languages. We leverage existing resources to construct a large-scale dataset of American Sign Language signs annotated with six different phonological properties. We then conduct an extensive empirical study to investigate whether data-driven end-to-end and feature-based approaches can be optimised to automatically recognise these properties. We find that, despite the inherent challenges of the task, graph-based neural networks that operate over skeleton features extracted from raw videos are able to succeed at the task to a varying degree. Most importantly, we show that this performance pertains even on signs unobserved during training.

updated: Fri Mar 11 2022 17:21:24 GMT+0000 (UTC)

published: Fri Mar 11 2022 17:21:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト