Improving Continuous Sign Language Recognition with Cross-Lingual Signs

Fangyun Wei; Yutong Chen

言語を超えた手話による継続的な手話認識の向上

この研究は、連続手話認識 (CSLR) に特化しています。CSLR は、連続する手話間の時間的境界についての事前知識がなくても、ビデオからの連続手話の認識を扱う弱い教師付きタスクです。データ不足は CSLR の進歩を大きく妨げます。既存のアプローチは通常、音声認識のコーパスよりも数桁小さい単一言語コーパスで CSLR モデルをトレーニングします。この研究では、単一言語 CSLR を促進するために多言語手話コーパスを利用する実現可能性を調査します。私たちの研究は、異なる手話に由来するものの、同様の視覚信号 (手の形や動きなど) を持つ言語間手話の観察に基づいて構築されています。私たちのアプローチの根底にある考え方は、ある手話の言語をまたいだ手話を識別し、それらを補助トレーニングデータとして適切に活用して、別の手話の認識能力を向上させることです。目標を達成するために、まず 2 つのデータセットに現れる孤立した手話を含む 2 つの手話辞書を構築します。次に、適切に最適化された分離手話認識モデルを介して、2 つの手話間の手話間のマッピングを特定します。最後に、元のラベルを持つターゲットデータとマップされたラベルを持つ補助データの組み合わせで CSLR モデルをトレーニングします。実験的に、私たちのアプローチは、広く使用されている 2 つの CSLR データセット、Phoenix-2014 と Phoenix-2014T で最先端のパフォーマンスを達成しました。

This work dedicates to continuous sign language recognition (CSLR), which is a weakly supervised task dealing with the recognition of continuous signs from videos, without any prior knowledge about the temporal boundaries between consecutive signs. Data scarcity heavily impedes the progress of CSLR. Existing approaches typically train CSLR models on a monolingual corpus, which is orders of magnitude smaller than that of speech recognition. In this work, we explore the feasibility of utilizing multilingual sign language corpora to facilitate monolingual CSLR. Our work is built upon the observation of cross-lingual signs, which originate from different sign languages but have similar visual signals (e.g., hand shape and motion). The underlying idea of our approach is to identify the cross-lingual signs in one sign language and properly leverage them as auxiliary training data to improve the recognition capability of another. To achieve the goal, we first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model. At last, we train a CSLR model on the combination of the target data with original labels and the auxiliary data with mapped labels. Experimentally, our approach achieves state-of-the-art performance on two widely-used CSLR datasets: Phoenix-2014 and Phoenix-2014T.

updated: Mon Aug 21 2023 15:58:47 GMT+0000 (UTC)

published: Mon Aug 21 2023 15:58:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト