Points2Sound: From mono to binaural audio using 3D point cloud scenes

Francesc Lluís; Vasileios Chatziioannou; Alex Hofmann

Points2Sound：3Dポイントクラウドシーンを使用したモノラルからバイノーラルオーディオまで

没入型アプリケーションの場合、仮想環境の人々に有意義な体験をもたらすには、視覚的な対応物と一致するバイノーラルサウンドの生成が不可欠です。最近の研究では、2D視覚情報をガイダンスとして使用して、モノラルオーディオからバイノーラルオーディオを合成するためにニューラルネットワークを使用する可能性が示されています。 3D視覚情報を使用してオーディオをガイドし、波形ドメインで操作することによってこのアプローチを拡張すると、仮想オーディオシーンのより正確な聴覚化が可能になる場合があります。この論文では、3Dポイントクラウドシーンを使用してモノラルオーディオからバイノーラルバージョンを生成するマルチモーダルディープラーニングモデルであるPoints2Soundを紹介します。具体的には、Points2Soundは、点群シーンから視覚的特徴を抽出して、波形ドメインで動作するオーディオネットワークを調整し、バイノーラルバージョンを合成する、3Dスパース畳み込みを備えたビジョンネットワークで構成されます。実験結果は、3D視覚情報がバイノーラル合成のタスクのためのマルチモーダル深層学習モデルをうまく導くことができることを示しています。さらに、さまざまな損失関数と3D点群属性を調査し、完全なバイノーラル信号を直接予測し、RGB深度機能を使用すると、提案されたモデルのパフォーマンスが向上することを示しています。

For immersive applications, the generation of binaural sound that matches the visual counterpart is crucial to bring meaningful experiences to people in a virtual environment. Recent works have shown the possibility to use neural networks for synthesizing binaural audio from mono audio using 2D visual information as guidance. Extending this approach by guiding the audio using 3D visual information and operating in the waveform domain may allow for a more accurate auralization of a virtual audio scene. In this paper, we present Points2Sound, a multi-modal deep learning model which generates a binaural version from mono audio using 3D point cloud scenes. Specifically, Points2Sound consists of a vision network with 3D sparse convolutions which extracts visual features from the point cloud scene to condition an audio network, which operates in the waveform domain, to synthesize the binaural version. Experimental results indicate that 3D visual information can successfully guide multi-modal deep learning models for the task of binaural synthesis. In addition, we investigate different loss functions and 3D point cloud attributes, showing that directly predicting the full binaural signal and using rgb-depth features increases the performance of our proposed model.

updated: Thu Nov 25 2021 14:46:58 GMT+0000 (UTC)

published: Mon Apr 26 2021 10:44:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト