Multi-modal Gated Mixture of Local-to-Global Experts for Dynamic Image Fusion

Yiming Sun; Bing Cao; Pengfei Zhu; Qinghua Hu

動的画像融合のためのローカルからグローバルへの専門家のマルチモーダルゲート混合

赤外線と可視画像の融合は、複数のソースからの包括的な情報を統合して、単一のモダリティよりも、検出などのさまざまな実用的なタスクで優れたパフォーマンスを達成することを目的としています。ただし、ほとんどの既存の方法は、さまざまなモダリティのテクスチャの詳細とオブジェクトのコントラストを直接組み合わせており、実際の動的な変化を無視しています。これにより、照明条件が良好な場合は目に見えるテクスチャが減少し、照明条件が低い場合は赤外線のコントラストが減少します。このギャップを埋めるために、MoE-Fusion と呼ばれる、ローカルからグローバルへの専門家のマルチモーダルゲート混合による動的画像融合フレームワークを提案し、それぞれのモダリティから効果的で包括的な情報を動的に抽出します。私たちのモデルは、マルチモーダルゲートによって導かれるローカルエキスパートの混合（MoLE）とグローバルエキスパートの混合（MoGE）で構成されています。 MoLE は、マルチモーダルな局所特徴の特殊な学習を実行し、融合画像がサンプルに適応する方法で局所情報を保持するよう促します。一方、MoGE は、融合画像を全体的なテクスチャの詳細とコントラストで補完するグローバル情報に焦点を当てます。広範な実験により、当社の MoE-Fusion は、ローカルからグローバルへの動的学習パラダイムを通じてマルチモーダル画像のテクスチャとコントラストを維持する最先端の方法よりも優れており、検出タスクでも優れたパフォーマンスを達成することが示されています。コードは https://github.com/SunYM2020/MoE-Fusion で入手できます。

Infrared and visible image fusion aims to integrate comprehensive information from multiple sources to achieve superior performances on various practical tasks, such as detection, over that of a single modality. However, most existing methods directly combined the texture details and object contrast of different modalities, ignoring the dynamic changes in reality, which diminishes the visible texture in good lighting conditions and the infrared contrast in low lighting conditions. To fill this gap, we propose a dynamic image fusion framework with a multi-modal gated mixture of local-to-global experts, termed MoE-Fusion, to dynamically extract effective and comprehensive information from the respective modalities. Our model consists of a Mixture of Local Experts (MoLE) and a Mixture of Global Experts (MoGE) guided by a multi-modal gate. The MoLE performs specialized learning of multi-modal local features, prompting the fused images to retain the local information in a sample-adaptive manner, while the MoGE focuses on the global information that complements the fused image with overall texture detail and contrast. Extensive experiments show that our MoE-Fusion outperforms state-of-the-art methods in preserving multi-modal image texture and contrast through the local-to-global dynamic learning paradigm, and also achieves superior performance on detection tasks. Our code will be available: https://github.com/SunYM2020/MoE-Fusion.

updated: Thu Mar 23 2023 07:15:53 GMT+0000 (UTC)

published: Thu Feb 02 2023 20:06:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト