Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

Yikai Wang; Fuchun Sun; Ming Lu; Anbang Yao

非対称多層融合による深いマルチモーダル特徴表現の学習

単一ネットワークの複数のレイヤーでマルチモーダル機能を融合するためのコンパクトで効果的なフレームワークを提案します。フレームワークは、2つの革新的な融合スキームで構成されています。まず、異なるモダリティに個別のエンコーダーを必要とする既存のマルチモーダル手法とは異なり、エンコーダーでモダリティ固有のバッチ正規化レイヤーを維持するだけで、共有単一ネットワーク内でマルチモーダル機能を学習できることを確認します。これにより、共同機能表現学習による暗黙的な融合も可能になります。。次に、マルチモーダル機能を段階的に活用できる双方向の多層融合スキームを提案します。このようなスキームを利用するために、チャネルシャッフルとピクセルシフトを含む2つの非対称融合操作を導入します。これらは、異なる融合方向に関して異なる融合機能を学習します。これらの2つの操作はパラメーターがなく、チャネル間のマルチモーダル機能の相互作用を強化するだけでなく、チャネル内の空間機能の識別を強化します。多様なモダリティをカバーする3つの公開されているデータセットに基づいて、セマンティックセグメンテーションと画像翻訳タスクに関する広範な実験を実施します。結果は、提案されたフレームワークが一般的でコンパクトであり、最先端の融合フレームワークよりも優れていることを示しています。

We propose a compact and effective framework to fuse multimodal features at multiple layers in a single network. The framework consists of two innovative fusion schemes. Firstly, unlike existing multimodal methods that necessitate individual encoders for different modalities, we verify that multimodal features can be learnt within a shared single network by merely maintaining modality-specific batch normalization layers in the encoder, which also enables implicit fusion via joint feature representation learning. Secondly, we propose a bidirectional multi-layer fusion scheme, where multimodal features can be exploited progressively. To take advantage of such scheme, we introduce two asymmetric fusion operations including channel shuffle and pixel shift, which learn different fused features with respect to different fusion directions. These two operations are parameter-free and strengthen the multimodal feature interactions across channels as well as enhance the spatial feature discrimination within channels. We conduct extensive experiments on semantic segmentation and image translation tasks, based on three publicly available datasets covering diverse modalities. Results indicate that our proposed framework is general, compact and is superior to state-of-the-art fusion frameworks.

updated: Wed Aug 11 2021 03:42:13 GMT+0000 (UTC)

published: Wed Aug 11 2021 03:42:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト