Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors

Paul S. Scotti; Atmadeep Banerjee; Jimmie Goode; Stepan Shabalin; Alex Nguyen; Ethan Cohen; Aidan J. Dempster; Nathalie Verlinde; Elad Yundler; David Weisberg; Kenneth A. Norman; Tanishq Mathew Abraham

心の目の再構築: 対照学習と拡散事前分布を使用した fMRI から画像への変換

我々は、脳活動から観察された画像を取得して再構成するための新しい fMRI から画像へのアプローチである MindEye を紹介します。私たちのモデルは、検索 (対照学習を使用) と再構成 (事前拡散を使用) に特化した 2 つの並列サブモジュールで構成されています。 MindEye は、fMRI の脳活動を CLIP 画像空間などの任意の高次元マルチモーダル潜在空間にマッピングすることができ、この潜在空間からの埋め込みを受け入れる生成モデルを使用して画像の再構成を可能にします。私たちは、定性的な並列比較と定量的な評価の両方を使用して、私たちのアプローチを他の既存の手法と包括的に比較し、MindEye が再構成タスクと検索タスクの両方で最先端のパフォーマンスを達成することを示します。特に、MindEye は、類似性の高い候補間でも正確な元の画像を取得できます。これは、MindEye の脳埋め込みが画像固有のきめ細かい情報を保持していることを示しています。これにより、LAION-5Bのような大規模データベースからでも正確に画像を取得できるようになりました。私たちは、アブレーションを通じて、MindEye の以前の方法に対するパフォーマンスの向上が、検索と再構成用の特殊なサブモジュール、改善されたトレーニング技術、および桁違いに多くのパラメーターを使用したトレーニングモデルの結果であることを実証します。さらに、MindEye が別のオートエンコーダーからの出力を使用して img2img を使用することにより、再構成において低レベルの画像特徴をより適切に保存できることを示します。すべてのコードは GitHub で入手できます。

We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.

updated: Mon May 29 2023 17:49:00 GMT+0000 (UTC)

published: Mon May 29 2023 17:49:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト