BrainCLIP: Bridging Brain and Visual-Linguistic Representation via CLIP for Generic Natural Visual Stimulus Decoding from fMRI

Yulong Liu; Yongqiang Ma; Wei Zhou; Guibo Zhu; Nanning Zheng

BrainCLIP: fMRI からの一般的な自然視覚刺激のデコードのための CLIP による脳と視覚言語表現の橋渡し

認識された自然画像を再構築したり、fMRI 信号からそれらのカテゴリをデコードしたりすることは、科学的に大きな意義を持つ挑戦的なタスクです。ペアのサンプルがないため、ほとんどの既存の方法は意味的に認識可能な再構成を生成できず、新しいクラスに一般化するのが困難です。この作業では、視覚刺激の分類と再構成タスクを意味空間で統合することにより、タスクに依存しない脳のデコードモデルを初めて提案します。これを BrainCLIP と呼びます。これは、CLIP のクロスモーダル一般化機能を活用して、脳活動、画像、およびテキスト間のモダリティギャップを埋めます。具体的には、BrainCLIP は VAE ベースのアーキテクチャであり、視覚とテキストの監視を組み合わせることで、fMRI パターンを CLIP 埋め込み空間に変換します。以前の作品では、視覚刺激のデコードにマルチモーダル監視を使用することはめったにないことに注意してください。私たちの実験は、画像の監視のみが存在する条件と比較して、テキストの監視がモデルのデコードのパフォーマンスを大幅に向上させることができることを示しています。 BrainCLIP は、fMRI-to-image 生成、fMRI-image-matching、fMRI-text-matching などの複数のシナリオに適用できます。最近提案されたfMRIベースの脳デコード用のマルチモーダル手法であるBraVLと比較して、BrainCLIPは新しいクラス分類タスクで大幅に優れたパフォーマンスを実現します。また、BrainCLIP は、高レベルの画像機能に関して、fMRI ベースの自然画像再構成の新しい最先端技術を確立します。

Reconstructing perceived natural images or decoding their categories from fMRI signals are challenging tasks with great scientific significance. Due to the lack of paired samples, most existing methods fail to generate semantically recognizable reconstruction and are difficult to generalize to novel classes. In this work, we propose, for the first time, a task-agnostic brain decoding model by unifying the visual stimulus classification and reconstruction tasks in a semantic space. We denote it as BrainCLIP, which leverages CLIP's cross-modal generalization ability to bridge the modality gap between brain activities, images, and texts. Specifically, BrainCLIP is a VAE-based architecture that transforms fMRI patterns into the CLIP embedding space by combining visual and textual supervision. Note that previous works rarely use multi-modal supervision for visual stimulus decoding. Our experiments demonstrate that textual supervision can significantly boost the performance of decoding models compared to the condition where only image supervision exists. BrainCLIP can be applied to multiple scenarios like fMRI-to-image generation, fMRI-image-matching, and fMRI-text-matching. Compared with BraVL, a recently proposed multi-modal method for fMRI-based brain decoding, BrainCLIP achieves significantly better performance on the novel class classification task. BrainCLIP also establishes a new state-of-the-art for fMRI-based natural image reconstruction in terms of high-level image features.

updated: Sun Apr 23 2023 04:24:15 GMT+0000 (UTC)

published: Sat Feb 25 2023 03:28:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト