Zero-shot Referring Image Segmentation with Global-Local Context Features

Seonghoon Yu; Paul Hongsuch Seo; Jeany Son

グローバルローカルコンテキスト機能を使用したゼロショット参照画像セグメンテーション

参照イメージセグメンテーション (RIS) は、入力イメージの領域に基づいた参照表現を指定して、セグメンテーションマスクを見つけることを目的としています。ただし、このタスクのためにラベル付けされたデータセットを収集することは、費用がかかり、労働集約的であることで有名です。この問題を克服するために、CLIP からの事前トレーニング済みのクロスモーダル知識を活用することにより、シンプルで効果的なゼロショット参照画像セグメンテーション方法を提案します。入力テキストに基づいたセグメンテーションマスクを取得するために、入力画像のグローバルおよびローカルコンテキスト情報をキャプチャするマスクガイド付きビジュアルエンコーダーを提案します。既製のマスク提案手法から取得したインスタンスマスクを利用することにより、この方法では、詳細なインスタンスレベルのグラウンディングをセグメント化できます。また、グローバル機能が入力式全体の複雑な文レベルのセマンティクスをキャプチャし、ローカル機能が依存関係パーサーによって抽出されたターゲット名詞句に焦点を当てるグローバルローカルテキストエンコーダーも導入します。私たちの実験では、提案された方法は、タスクのいくつかのゼロショットベースラインよりも優れており、かなりのマージンを持つ弱く教師付きの参照式セグメンテーション方法でさえも優れています。コードは https://github.com/Seonghoon-Yu/Zero-shot-RIS で入手できます。

Referring image segmentation (RIS) aims to find a segmentation mask given a referring expression grounded to a region of the input image. Collecting labelled datasets for this task, however, is notoriously costly and labor-intensive. To overcome this issue, we propose a simple yet effective zero-shot referring image segmentation method by leveraging the pre-trained cross-modal knowledge from CLIP. In order to obtain segmentation masks grounded to the input text, we propose a mask-guided visual encoder that captures global and local contextual information of an input image. By utilizing instance masks obtained from off-the-shelf mask proposal techniques, our method is able to segment fine-detailed Istance-level groundings. We also introduce a global-local text encoder where the global feature captures complex sentence-level semantics of the entire input expression while the local feature focuses on the target noun phrase extracted by a dependency parser. In our experiments, the proposed method outperforms several zero-shot baselines of the task and even the weakly supervised referring expression segmentation method with substantial margins. Our code is available at https://github.com/Seonghoon-Yu/Zero-shot-RIS.

updated: Fri Mar 31 2023 06:00:50 GMT+0000 (UTC)

published: Fri Mar 31 2023 06:00:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト