Open-Vocabulary Universal Image Segmentation with MaskCLIP

Zheng Ding; Jieke Wang; Zhuowen Tu

MaskCLIP を使用したオープンボキャブラリーのユニバーサル画像セグメンテーション

この論文では、推論時にテキストベースの記述の任意のカテゴリに対してセマンティック/インスタンス/パノプティックセグメンテーション (バックグラウンドセマンティックラベリング + フォアグラウンドインスタンスセグメンテーション) を実行することを目的とした、新しいコンピュータービジョンタスクであるオープンボキャブラリーユニバーサル画像セグメンテーションに取り組みます。まず、微調整や蒸留を行わずに、事前トレーニング済みの CLIP モデルを直接採用することで、ベースライン手法を構築します。次に、MaskCLIP Visual Encoder を使用した Transformer ベースのアプローチである MaskCLIP を開発します。これは、セマンティック/インスタンスのセグメンテーションとクラス予測のためにマスクトークンを事前トレーニングされた ViT CLIP モデルとシームレスに統合するエンコーダー専用モジュールです。 MaskCLIP は、MaskCLIP Visual Encoder 内の事前トレーニング済みの部分/密 CLIP 機能を効率的かつ効果的に利用する方法を学習し、時間のかかる生徒と教師のトレーニングプロセスを回避します。 MaskCLIP は、ADE20K および PASCAL データセットでのセマンティック/インスタンス/パノプティックセグメンテーションにおいて、以前の方法よりも優れたパフォーマンスを発揮します。オンラインカスタムカテゴリを使用したMaskCLIPの定性的なイラストを示します。プロジェクトの Web サイト: https://maskclip.github.io。

In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time. We first build a baseline method by directly adopting pre-trained CLIP models without finetuning or distillation. We then develop MaskCLIP, a Transformer-based approach with a MaskCLIP Visual Encoder, which is an encoder-only module that seamlessly integrates mask tokens with a pre-trained ViT CLIP model for semantic/instance segmentation and class prediction. MaskCLIP learns to efficiently and effectively utilize pre-trained partial/dense CLIP features within the MaskCLIP Visual Encoder that avoids the time-consuming student-teacher training process. MaskCLIP outperforms previous methods for semantic/instance/panoptic segmentation on ADE20K and PASCAL datasets. We show qualitative illustrations for MaskCLIP with online custom categories. Project website: https://maskclip.github.io.

updated: Thu Jun 08 2023 06:35:33 GMT+0000 (UTC)

published: Thu Aug 18 2022 17:55:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト