MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

Anton Ratnarajah; Zhenyu Tang; Rohith Chandrashekar Aralikatti; Dinesh Manocha

MESH2IR：複雑な3Dシーン用のニューラル音響インパルス応答ジェネレータ

メッシュを使用して表される屋内3Dシーンの音響インパルス応答（IR）を生成するために、メッシュベースのニューラルネットワーク（MESH2IR）を提案します。 IRは、インタラクティブなアプリケーションやオーディオ処理で高品質のサウンド体験を作成するために使用されます。私たちの方法は、任意のトポロジー（2K〜3Mの三角形）を持つ入力三角形メッシュを処理できます。エネルギー減衰リリーフを使用してMESH2IRをトレーニングするための新しいトレーニング手法を紹介し、その利点を強調します。また、提案された手法を使用して前処理されたIRでMESH2IRをトレーニングすると、IR生成の精度が大幅に向上することも示します。グラフ畳み込みネットワークを使用して3Dシーンメッシュを潜在空間に変換することにより、メッシュ空間の非線形性を低減します。私たちのMESH2IRは、CPUの幾何学的音響アルゴリズムよりも200倍以上高速であり、特定の家具付き屋内3Dシーンに対してNVIDIA GeForce RTX 2080TiGPUで毎秒10,000以上のIRを生成できます。音響メトリックは、音響環境を特徴づけるために使用されます。 MESH2IRから予測されたIRの音響メトリックが、10％未満の誤差でグラウンドトゥルースと一致することを示します。また、音声残響除去や音声分離などの音声および音声処理アプリケーションにおけるMESH2IRの利点についても説明します。私たちの知る限りでは、私たちのアプローチは、特定の3DシーンメッシュからリアルタイムでIRを予測する最初のニューラルネットワークベースのアプローチです。

We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR using energy decay relief and highlight its benefits. We also show that training MESH2IR on IRs preprocessed using our proposed technique significantly improves the accuracy of IR generation. We reduce the non-linearity in the mesh space by transforming 3D scene meshes to latent space using a graph convolution network. Our MESH2IR is more than 200 times faster than a geometric acoustic algorithm on a CPU and can generate more than 10,000 IRs per second on an NVIDIA GeForce RTX 2080 Ti GPU for a given furnished indoor 3D scene. The acoustic metrics are used to characterize the acoustic environment. We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error. We also highlight the benefits of MESH2IR on audio and speech processing applications such as speech dereverberation and speech separation. To the best of our knowledge, ours is the first neural-network-based approach to predict IRs from a given 3D scene mesh in real-time.

updated: Wed May 18 2022 23:50:34 GMT+0000 (UTC)

published: Wed May 18 2022 23:50:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト