SFusion: Self-attention based N-to-One Multimodal Fusion Block

Zecheng Liu; Jia Wei; Rui Li; Jianlong Zhou

SFusion: セルフアテンションベースの N 対 1 マルチモーダルフュージョンブロック

人は視覚、聴覚、嗅覚、触覚など、さまざまな感覚で世界を認識します。複数のモダリティからの情報を処理および融合することで、人工知能は私たちの周囲の世界をより簡単に理解できるようになります。しかし、モダリティが欠落している場合、利用可能なモダリティの数が状況によって異なり、N 対 1 の融合問題が発生します。この問題を解決するために、SFusion と呼ばれる自己注意ベースの融合ブロックを提案します。事前に設定された定式化や畳み込みベースの方法とは異なり、提案されたブロックは、欠落しているモダリティを合成したりゼロ埋めしたりすることなく、利用可能なモダリティを融合する方法を自動的に学習します。具体的には、上流の処理モデルから抽出された特徴表現がトークンとして投影され、セルフアテンションモジュールに供給されて、潜在的なマルチモーダル相関が生成されます。次に、モーダルアテンションメカニズムを導入して、下流の意思決定モデルによって適用できる共有表現を構築します。提案された SFusion は、既存のマルチモーダル解析ネットワークに簡単に統合できます。この研究では、人間の活動認識と脳腫瘍のセグメンテーションタスクのために SFusion をさまざまなバックボーンネットワークに適用します。広範な実験結果は、SFusion ブロックが競合する融合戦略よりも優れたパフォーマンスを達成することを示しています。私たちのコードは https://github.com/scut-cszcl/SFusion で入手できます。

People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.

updated: Tue Jul 04 2023 14:50:31 GMT+0000 (UTC)

published: Fri Aug 26 2022 16:42:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト