Discriminative Neural Clustering for Speaker Diarisation

Qiujia Li; Florian L. Kreyssig; Chao Zhang; Philip C. Woodland

スピーカーのダイアリゼーションのための識別的ニューラルクラスタリング

本論文では、教師ありシーケンス学習問題として、最大数のクラスターでデータクラスタリングを定式化する識別ニューラルクラスタリング（DNC）を提案します。従来の教師なしクラスタリングアルゴリズムと比較して、DNCは、類似性尺度の明示的な定義を必要とせずに、トレーニングデータからクラスタリングパターンを学習します。 Transformerアーキテクチャに基づくDNCの実装は、困難なAMIデータセットを使用したスピーカーのダイアリゼーションタスクに効果的であることが示されています。 AMIには、個々の入力シーケンスとして147の完全な会議しか含まれていないため、データの不足は、DNCのTransformerモデルをトレーニングするための重要な問題です。したがって、この論文では、サブシーケンスランダム化、入力ベクトルランダム化、およびL2正規化スピーカー埋め込みの入力シーケンス全体を回転させることによって新しいデータサンプルを生成するDiaconis拡張の3つのデータ拡張スキームを提案します。 AMIの実験結果は、DNCがスペクトルクラスタリングと比較して29.4％のスピーカーエラー率（SER）の削減を達成することを示しています。

In this paper, we propose Discriminative Neural Clustering (DNC) that formulates data clustering with a maximum number of clusters as a supervised sequence-to-sequence learning problem. Compared to traditional unsupervised clustering algorithms, DNC learns clustering patterns from training data without requiring an explicit definition of a similarity measure. An implementation of DNC based on the Transformer architecture is shown to be effective on a speaker diarisation task using the challenging AMI dataset. Since AMI contains only 147 complete meetings as individual input sequences, data scarcity is a significant issue for training a Transformer model for DNC. Accordingly, this paper proposes three data augmentation schemes: sub-sequence randomisation, input vector randomisation, and Diaconis augmentation, which generates new data samples by rotating the entire input sequence of L2-normalised speaker embeddings. Experimental results on AMI show that DNC achieves a reduction in speaker error rate (SER) of 29.4% relative to spectral clustering.

updated: Mon Nov 23 2020 15:32:03 GMT+0000 (UTC)

published: Tue Oct 22 2019 00:09:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト