Provable Dynamic Fusion for Low-Quality Multimodal Data

Qingyang Zhang; Haitao Wu; Changqing Zhang; Qinghua Hu; Huazhu Fu; Joey Tianyi Zhou; Xi Peng

低品質のマルチモーダルデータの証明可能な動的融合

マルチモーダルフュージョンの固有の課題は、クロスモーダル相関を正確に捕捉し、クロスモーダルインタラクションを柔軟に実行することです。各モダリティの価値を完全に解放し、低品質のマルチモーダルデータの影響を軽減するために、動的マルチモーダルフュージョンが有望な学習パラダイムとして浮上しています。広く使用されているにもかかわらず、この分野における理論的根拠は依然として著しく不足しています。証明された堅牢なマルチモーダル融合手法を設計できるでしょうか?この論文では、一般化の観点から最も一般的なマルチモーダル融合フレームワークの下でこの質問に答えるための理論的理解を提供します。我々は、堅牢なマルチモーダル融合を達成するために、いくつかの不確実性推定ソリューションが自然に利用できることを明らかにしていきます。次に、Quality-aware Multimodal Fusion (QMF) と呼ばれる新しいマルチモーダルフュージョンフレームワークが提案され、分類精度とモデルの堅牢性の点でパフォーマンスを向上させることができます。複数のベンチマークに関する広範な実験結果が私たちの発見を裏付ける可能性があります。

The inherent challenge of multimodal fusion is to precisely capture the cross-modal correlation and flexibly conduct cross-modal interaction. To fully release the value of each modality and mitigate the influence of low-quality multimodal data, dynamic multimodal fusion emerges as a promising learning paradigm. Despite its widespread use, theoretical justifications in this field are still notably lacking. Can we design a provably robust multimodal fusion method? This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective. We proceed to reveal that several uncertainty estimation solutions are naturally available to achieve robust multimodal fusion. Then a novel multimodal fusion framework termed Quality-aware Multimodal Fusion (QMF) is proposed, which can improve the performance in terms of classification accuracy and model robustness. Extensive experimental results on multiple benchmarks can support our findings.

updated: Tue Jun 06 2023 13:46:22 GMT+0000 (UTC)

published: Sat Jun 03 2023 08:32:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト