Image2Reverb: Cross-Modal Reverb Impulse Response Synthesis

Nikhil Singh; Jeff Mentch; Jerry Ng; Matthew Beveridge; Iddo Drori

Image2Reverb：クロスモーダルリバーブインパルス応答合成

空間の音響特性の測定は、多くの場合、フルレンジの刺激音が空間をどのように励起するかを表すインパルス応答（IR）をキャプチャすることによって行われます。この作業では、単一の画像からIRを生成し、畳み込みを使用して他の信号に適用して、画像に示されている空間の残響特性をシミュレートできます。これらのIRの記録は、時間と費用の両方がかかり、アクセスできない場所では実行できないことがよくあります。エンドツーエンドのニューラルネットワークアーキテクチャを使用して、音響環境の単一の画像からもっともらしいオーディオインパルス応答を生成します。グラウンドトゥルースデータとの比較と人間の専門家による評価の両方によって、私たちの方法を評価します。よく知られている場所、音楽ホール、絵画の部屋、アニメーションやコンピューターゲームの画像、テキストから生成された合成環境、パノラマ画像、ビデオ会議の背景など、さまざまな設定や形式からもっともらしいインパルス応答を生成することで、アプローチを示します。

Measuring the acoustic characteristics of a space is often done by capturing its impulse response (IR), a representation of how a full-range stimulus sound excites it. This work generates an IR from a single image, which can then be applied to other signals using convolution, simulating the reverberant characteristics of the space shown in the image. Recording these IRs is both time-intensive and expensive, and often infeasible for inaccessible locations. We use an end-to-end neural network architecture to generate plausible audio impulse responses from single images of acoustic environments. We evaluate our method both by comparisons to ground truth data and by human expert evaluation. We demonstrate our approach by generating plausible impulse responses from diverse settings and formats including well known places, musical halls, rooms in paintings, images from animations and computer games, synthetic environments generated from text, panoramic images, and video conference backgrounds.

updated: Fri Aug 13 2021 18:48:16 GMT+0000 (UTC)

published: Fri Mar 26 2021 01:25:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト