Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding

Zijiao Chen; Jiaxin Qing; Tiange Xiang; Wan Lin Yue; Juan Helen Zhou

脳を超えて見る: ビジョンデコーディングのためのスパースマスクモデリングを使用した条件付き拡散モデル

脳の記録から視覚刺激を解読することは、人間の視覚システムの理解を深め、脳とコンピューターのインターフェースを通じて人間とコンピューターの視覚を橋渡しするための強固な基盤を構築することを目的としています。ただし、脳信号の複雑な基本的な表現とデータ注釈の不足により、脳の記録から正しいセマンティクスを使用して高品質の画像を再構築することは困難な問題です。この作業では、MinD-Vis を提示します: 人間の視覚デコードのための二重条件付き潜在拡散モデルを使用したスパースマスクされた脳モデリング。まず、一次視覚野の情報のまばらなコーディングに触発された大きな潜在空間でマスクモデリングを使用して、fMRI データの効果的な自己教師あり表現を学習します。次に、二重条件付けで潜在拡散モデルを拡張することにより、MinD-Vis が、非常に少数のペアの注釈を使用して、脳の記録から意味的に一致する詳細を備えた非常にもっともらしい画像を再構築できることを示します。モデルを定性的および定量的にベンチマークしました。実験結果は、セマンティックマッピング (100 通りのセマンティック分類) と生成品質 (FID) の両方で、私たちの方法が最先端技術をそれぞれ 66% と 41% 上回っていることを示しています。私たちのフレームワークを分析するために、徹底的なアブレーション研究も実施されました。

Decoding visual stimuli from brain recordings aims to deepen our understanding of the human visual system and build a solid foundation for bridging human and computer vision through the Brain-Computer Interface. However, reconstructing high-quality images with correct semantics from brain recordings is a challenging problem due to the complex underlying representations of brain signals and the scarcity of data annotations. In this work, we present MinD-Vis: Sparse Masked Brain Modeling with Double-Conditioned Latent Diffusion Model for Human Vision Decoding. Firstly, we learn an effective self-supervised representation of fMRI data using mask modeling in a large latent space inspired by the sparse coding of information in the primary visual cortex. Then by augmenting a latent diffusion model with double-conditioning, we show that MinD-Vis can reconstruct highly plausible images with semantically matching details from brain recordings using very few paired annotations. We benchmarked our model qualitatively and quantitatively; the experimental results indicate that our method outperformed state-of-the-art in both semantic mapping (100-way semantic classification) and generation quality (FID) by 66% and 41% respectively. An exhaustive ablation study was also conducted to analyze our framework.

updated: Sun Nov 13 2022 17:04:05 GMT+0000 (UTC)

published: Sun Nov 13 2022 17:04:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト