Data Quality as Predictor of Voice Anti-Spoofing Generalization

Bhusan Chettri; Rosa González Hautamäki; Md Sahidullah; Tomi Kinnunen

音声スプーフィング防止の一般化の予測因子としてのデータ品質

音声のなりすまし防止は、特定の発話を人間の真正なサンプル、またはなりすまし攻撃（合成サンプルや再生サンプルなど）として分類することを目的としています。多くのなりすまし防止方法が提案されていますが、それらのほとんどはドメイン（コーパス）間で一般化できていません。その理由はわかりません。なりすまし防止のパフォーマンスに対するデータ品質の影響を測定するための新しい解釈フレームワークの概要を説明します。ドメイン内およびドメイン間の実験では、ガウス混合モデルと畳み込みニューラルネットワークモデルに基づいて、7つのパブリックコーパスと3つのスプーフィング防止方法からのデータをプールします。長期的なスペクトル情報、スピーカーの母集団（xベクトルスピーカーの埋め込みによる）、信号対雑音比、および選択された音声品質機能の影響を評価します。

Voice anti-spoofing aims at classifying a given utterance either as a bonafide human sample, or a spoofing attack (e.g. synthetic or replayed sample). Many anti-spoofing methods have been proposed but most of them fail to generalize across domains (corpora) -- and we do not know why. We outline a novel interpretative framework for gauging the impact of data quality upon anti-spoofing performance. Our within- and between-domain experiments pool data from seven public corpora and three anti-spoofing methods based on Gaussian mixture and convolutive neural network models. We assess the impacts of long-term spectral information, speaker population (through x-vector speaker embeddings), signal-to-noise ratio, and selected voice quality features.

updated: Mon Jun 21 2021 20:53:23 GMT+0000 (UTC)

published: Fri Mar 26 2021 17:09:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト