Comprehensive Multi-Modal Interactions for Referring Image Segmentation

Kanishk Jain; Vineet Gandhi

画像セグメンテーションを参照するための包括的なマルチモーダルインタラクション

与えられた自然言語の記述に対応するセグメンテーションマップを出力する参照画像セグメンテーション（RIS）を調査します。 RISを効率的に解決するには、各単語と他の単語との関係、画像内の各領域と他の領域の関係、および言語ドメインと視覚ドメイン間のクロスモーダルアラインメントを理解する必要があります。最近の方法では、これら3つのタイプの相互作用を順番にモデル化しています。このようなモジュール式のアプローチはこれらのメソッドのパフォーマンスを制限し、共同の同時推論があいまいさの解決に役立つ可能性があると私たちは主張します。この目的のために、このタスクに取り組むための共同推論（JRM）モジュールと新しいクロスモーダルマルチレベルフュージョン（CMMLF）モジュールを提案します。 JRMは、視覚的および言語的モダリティ（単一のモジュールで単語-単語、画像領域-領域、単語-領域の相互作用を実行する）を共同で推論することにより、指示対象のマルチモーダルコンテキストを効果的にモデル化します。 CMMLFモジュールは、ブリッジとして機能する言語機能を介して視覚的階層全体でコンテキスト情報を交換することにより、セグメンテーションマスクをさらに改良します。徹底的なアブレーション研究を提示し、4つのベンチマークデータセットでのアプローチのパフォーマンスを検証し、提案された方法が4つのデータセットすべてで既存の最先端の方法を大幅に上回っていることを示します。

We investigate Referring Image Segmentation (RIS), which outputs a segmentation map corresponding to the given natural language description. To solve RIS efficiently, we need to understand each word's relationship with other words, each region in the image to other regions, and cross-modal alignment between linguistic and visual domains. Recent methods model these three types of interactions sequentially. We argue that such a modular approach limits these methods' performance, and joint simultaneous reasoning can help resolve ambiguities. To this end, we propose a Joint Reasoning (JRM) module and a novel Cross-Modal Multi-Level Fusion (CMMLF) module for tackling this task. JRM effectively models the referent's multi-modal context by jointly reasoning over visual and linguistic modalities (performing word-word, image region-region, word-region interactions in a single module). CMMLF module further refines the segmentation masks by exchanging contextual information across visual hierarchy through linguistic features acting as a bridge. We present thorough ablation studies and validate our approach's performance on four benchmark datasets, and show that the proposed method outperforms the existing state-of-the-art methods on all four datasets by significant margins.

updated: Wed Apr 21 2021 08:45:09 GMT+0000 (UTC)

published: Wed Apr 21 2021 08:45:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト