MXM-CLR: A Unified Framework for Contrastive Learning of Multifold Cross-Modal Representations

Ye Wang; Bowei Jiang; Changqing Zou; Rui Ma

MXM-CLR: マルチフォールドクロスモーダル表現の対照学習のための統合フレームワーク

複数の観察は、さまざまなデータモダリティで一般的です。たとえば、3D 形状を多視点画像で表現したり、画像をさまざまなキャプションで説明したりできます。 CLIP などの既存のクロスモーダル対比表現学習 (XM-CLR) メソッドは、対比損失を計算するときに 1 つの正のペアのみを考慮し、他のペアを負として扱うため、マルチフォールドデータには完全には適していません。この論文では、マルチフォールドクロスモーダル表現の対照学習のための統合フレームワークである MXM-CLR を提案します。 MXM-CLR は、より包括的な表現学習のために、さまざまなモダリティからのインスタンスの複数の観測間の関係を明示的にモデル化および学習します。 MXM-CLR の鍵は、クロスモーダルデータペアのハードとソフトの関係を計算するときに、複数の肯定的な観測を考慮する、新しいマルチフォールド対応のハイブリッド損失です。 Text2Shape および Flickr30K データセットのクロスモーダル検索タスクについて、SOTA ベースラインとの定量的および定性的な比較を行います。また、MXM-CLR の適応性と一般化可能性に関する広範な評価、および損失設計とバッチサイズの影響に関するアブレーション研究も行います。結果は、マルチフォールドデータのより良い表現の学習における MXM-CLR の優位性を示しています。コードは https://github.com/JLU-ICL/MXM-CLR で入手できます。

Multifold observations are common for different data modalities, e.g., a 3D shape can be represented by multi-view images and an image can be described with different captions. Existing cross-modal contrastive representation learning (XM-CLR) methods such as CLIP are not fully suitable for multifold data as they only consider one positive pair and treat other pairs as negative when computing the contrastive loss. In this paper, we propose MXM-CLR, a unified framework for contrastive learning of multifold cross-modal representations. MXM-CLR explicitly models and learns the relationships between multifold observations of instances from different modalities for more comprehensive representation learning. The key of MXM-CLR is a novel multifold-aware hybrid loss which considers multiple positive observations when computing the hard and soft relationships for the cross-modal data pairs. We conduct quantitative and qualitative comparisons with SOTA baselines for cross-modal retrieval tasks on the Text2Shape and Flickr30K datasets. We also perform extensive evaluations on the adaptability and generalizability of MXM-CLR, as well as ablation studies on the loss design and effects of batch sizes. The results show the superiority of MXM-CLR in learning better representations for the multifold data. The code is available at https://github.com/JLU-ICL/MXM-CLR.

updated: Tue Mar 21 2023 02:37:37 GMT+0000 (UTC)

published: Mon Mar 20 2023 02:51:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト