Few-Shot Segmentation via Cycle-Consistent Transformer

Gengwei Zhang; Guoliang Kang; Yunchao Wei; Yi Yang

Cycle-Consistent Transformer による少数ショットセグメンテーション

少数ショットセグメンテーションは、見本が少ない新しいクラスにすばやく適応できるセグメンテーションモデルをトレーニングすることを目的としています。従来のトレーニングパラダイムは、サポート画像からの特徴に基づいて条件付けられたクエリ画像で予測を行うことを学習することです。以前の方法では、サポート画像の意味レベルのプロトタイプのみを条件情報として利用していました。これらの方法では、クエリ予測にすべてのピクセル単位のサポート情報を利用することはできませんが、これはセグメンテーションタスクにとって重要です。このペーパーでは、サポート画像とターゲット画像の間のピクセル単位の関係を利用して、少数ショットのセマンティックセグメンテーションタスクを促進することに焦点を当てています。ピクセル単位のサポート機能をクエリ機能に集約するための新しい Cycle-Consistent Transformer (CyCTR) モジュールを設計しました。 CyCTR は、異なる画像、つまりサポート画像とクエリ画像からの特徴間でクロスアテンションを実行します。予想外の無関係なピクセルレベルのサポート機能が存在する可能性があることがわかりました。クロスアテンションを直接実行すると、これらの機能がサポートからクエリに集約され、クエリ機能にバイアスがかかる可能性があります。したがって、私たちは、潜在的な有害なサポート機能を除外し、クエリ機能がサポート画像から最も有益なピクセルに注意を向けるように促すために、新しいサイクル一貫性のあるアテンションメカニズムを使用することを提案します。すべての少数ショットセグメンテーションベンチマークの実験は、提案された CyCTR が以前の最先端の方法と比較して顕著な改善につながることを示しています。具体的には、Pascal-5^i および COCO-20^i データセットでは、5 ショットセグメンテーションで 66.6% および 45.6% の mIoU を達成し、以前の最先端技術をそれぞれ 4.6% および 7.1% 上回っています。

Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as the conditional information. These methods cannot utilize all pixel-wise support information for the query predictions, which is however critical for the segmentation task. In this paper, we focus on utilizing pixel-wise relationships between support and target images to facilitate the few-shot semantic segmentation task. We design a novel Cycle-Consistent Transformer (CyCTR) module to aggregate pixel-wise support features into query ones. CyCTR performs cross-attention between features from different images, i.e. support and query images. We observe that there may exist unexpected irrelevant pixel-level support features. Directly performing cross-attention may aggregate these features from support to query and bias the query features. Thus, we propose using a novel cycle-consistent attention mechanism to filter out possible harmful support features and encourage query features to attend to the most informative pixels from support images. Experiments on all few-shot segmentation benchmarks demonstrate that our proposed CyCTR leads to remarkable improvement compared to previous state-of-the-art methods. Specifically, on Pascal-5^i and COCO-20^i datasets, we achieve 66.6% and 45.6% mIoU for 5-shot segmentation, outperforming previous state-of-the-art by 4.6% and 7.1% respectively.

updated: Wed Oct 20 2021 11:50:27 GMT+0000 (UTC)

published: Fri Jun 04 2021 07:57:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト