BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus Decoding

Yulong Liu; Yongqiang Ma; Wei Zhou; Guibo Zhu; Nanning Zheng

BrainCLIP: 一般的な自然視覚刺激デコーディングのための CLIP を介した脳と視覚言語表現の橋渡し

機能的 MRI (fMRI) 信号はペアのサンプルが不足しており、信号対雑音比が低いため、fMRI データから知覚された自然画像を再構成したり、その意味内容を解読したりすることは困難な作業です。この研究では、タスクに依存しない fMRI ベースの脳デコードモデル BrainCLIP を初めて提案します。これは、CLIP のクロスモーダル汎化機能を活用して、脳活動、画像、テキストの間のモダリティギャップを橋渡しします。私たちの実験は、CLIP が、ゼロショット視覚カテゴリのデコード、fMRI 画像とテキストのマッチング、fMRI から画像の生成など、一般的な脳デコードタスクの中心として機能できることを示しています。具体的には、BrainCLIP は、視覚的監視とテキストによる監視を組み合わせることにより、fMRI パターンを適切に調整された CLIP 埋め込み空間に変換するマッピングネットワークをトレーニングすることを目的としています。私たちの実験では、この組み合わせにより、fMRI とテキストのマッチングや fMRI と画像の生成などの特定のタスクにおけるデコードモデルのパフォーマンスが向上することが示されています。ゼロショットビジュアルカテゴリデコードタスクでは、BrainCLIP は、このタスク専用に設計された最近提案されたマルチモーダル手法である BraVL よりも大幅に優れたパフォーマンスを達成します。 BrainCLIP は、高い意味的忠実度で視覚刺激を再構成することもでき、高レベルの意味的特徴の観点から、fMRI ベースの自然画像再構成の新しい最先端技術を確立します。

Due to the lack of paired samples and the low signal-to-noise ratio of functional MRI (fMRI) signals, reconstructing perceived natural images or decoding their semantic contents from fMRI data are challenging tasks. In this work, we propose, for the first time, a task-agnostic fMRI-based brain decoding model, BrainCLIP, which leverages CLIP's cross-modal generalization ability to bridge the modality gap between brain activity, image, and text. Our experiments demonstrate that CLIP can act as a pivot for generic brain decoding tasks, including zero-shot visual categories decoding, fMRI-image/text matching, and fMRI-to-image generation. Specifically, BrainCLIP aims to train a mapping network that transforms fMRI patterns into a well-aligned CLIP embedding space by combining visual and textual supervision. Our experiments show that this combination can boost the decoding model's performance on certain tasks like fMRI-text matching and fMRI-to-image generation. On the zero-shot visual category decoding task, BrainCLIP achieves significantly better performance than BraVL, a recently proposed multi-modal method specifically designed for this task. BrainCLIP can also reconstruct visual stimuli with high semantic fidelity and establishes a new state-of-the-art for fMRI-based natural image reconstruction in terms of high-level semantic features.

updated: Mon May 15 2023 04:32:59 GMT+0000 (UTC)

published: Sat Feb 25 2023 03:28:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト