Switching Variational Auto-Encoders for Noise-Agnostic Audio-visual Speech Enhancement

Mostafa Sadeghi; Xavier Alameda-Pineda

ノイズにとらわれないオーディオビジュアル音声強調のための変分オートエンコーダの切り替え

最近、視聴覚音声強調は、変分オートエンコーダー（VAE）に基づく教師なし設定で取り組まれています。この場合、トレーニング中はクリーンなデータのみが音声の生成モデルのトレーニングに使用され、テスト時にノイズモデルと組み合わされます。たとえば、教師なしでパラメータが学習される非負行列因子分解（NMF）。したがって、提案されたモデルはノイズタイプに依存しません。ビジュアルデータがクリーンな場合、オーディオビジュアルVAEベースのアーキテクチャは通常、オーディオのみのアーキテクチャよりも優れています。スピーカーがカメラの方を向いていないなど、視覚データが乱雑に破損している場合は、逆のことが起こります。このホワイトペーパーでは、これら2つのアーキテクチャの最適な組み合わせを時系列で見つけることを提案します。より正確には、マルコフ依存性を持つ潜在的な順次変数を使用して、教師なしの方法で時間の経過とともに異なるVAEアーキテクチャを切り替えることを紹介します。これにより、変分オートエンコーダー（SwVAE）が切り替わります。計算上扱いにくい事後分布を近似するために変分因数分解を提案します。また、モデルのパラメータを推定し、音声信号を強化するために、対応する変分期待値最大化アルゴリズムを導出します。私たちの実験は、SwVAEの有望なパフォーマンスを示しています。

Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e.g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision. Consequently, the proposed model is agnostic to the noise type. When visual data are clean, audio-visual VAE-based architectures usually outperform the audio-only counterpart. The opposite happens when the visual data are corrupted by clutter, e.g. the speaker not facing the camera. In this paper, we propose to find the optimal combination of these two architectures through time. More precisely, we introduce the use of a latent sequential variable with Markovian dependencies to switch between different VAE architectures through time in an unsupervised manner: leading to switching variational auto-encoder (SwVAE). We propose a variational factorization to approximate the computationally intractable posterior distribution. We also derive the corresponding variational expectation-maximization algorithm to estimate the parameters of the model and enhance the speech signal. Our experiments demonstrate the promising performance of SwVAE.

updated: Mon Feb 08 2021 11:45:02 GMT+0000 (UTC)

published: Mon Feb 08 2021 11:45:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト