Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

Minheng Ni; Yabo Zhang; Kailai Feng; Xiaoming Li; Yiwen Guo; Wangmeng Zuo

Ref-Diff: 生成モデルを使用したゼロショット参照画像セグメンテーション

ゼロショット参照画像セグメンテーションは、このタイプのペアデータでトレーニングせずに、指定された参照記述に基づいてインスタンスセグメンテーションマスクを見つけることを目的としているため、困難なタスクです。現在のゼロショット方法は、主に、事前トレーニングされた識別モデル (CLIP など) の使用に焦点を当てています。ただし、生成モデル (安定拡散など) は、このタスクではほとんど調査されないさまざまな視覚要素とテキストの説明の間の関係を潜在的に理解していることを観察しました。この研究では、生成モデルからのきめの細かいマルチモーダル情報を活用する、このタスク用の新しい参照拡散セグメンター (Ref-Diff) を導入します。私たちは、提案ジェネレーターを使用せずに、生成モデルだけで既存の SOTA の弱教師モデルと同等のパフォーマンスを達成できることを実証します。生成モデルと識別モデルの両方を組み合わせると、Ref-Diff はこれらの競合する方法よりも大幅に優れたパフォーマンスを発揮します。これは、生成モデルもこのタスクに有益であり、より適切な参照セグメンテーションのために識別モデルを補完できることを示しています。私たちのコードは https://github.com/kodenii/Ref-Diff で公開されています。

Zero-shot referring image segmentation is a challenging task because it aims to find an instance segmentation mask based on the given referring descriptions, without training on this type of paired data. Current zero-shot methods mainly focus on using pre-trained discriminative models (e.g., CLIP). However, we have observed that generative models (e.g., Stable Diffusion) have potentially understood the relationships between various visual elements and text descriptions, which are rarely investigated in this task. In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models. We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models. When we combine both generative and discriminative models, our Ref-Diff outperforms these competing methods by a significant margin. This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation. Our code is publicly available at https://github.com/kodenii/Ref-Diff.

updated: Fri Sep 01 2023 05:57:47 GMT+0000 (UTC)

published: Thu Aug 31 2023 14:55:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト