Weakly-supervised segmentation of referring expressions

Robin Strudel; Ivan Laptev; Cordelia Schmid

参照式の弱く監視されたセグメンテーション

視覚的接地は、特定の参照式に対応する画像内の領域（ボックスまたはセグメント）をローカライズします。この作業では、参照式からの画像セグメンテーションに対処します。これは、これまで完全に監視された設定でのみ対処されてきた問題です。ただし、完全に監視されたセットアップでは、ピクセル単位の監視が必要であり、手動の注釈を使用するため、スケーリングが困難です。したがって、参照式から弱く監視された画像セグメンテーションの新しいタスクを導入し、ピクセルレベルの注釈なしで画像レベルの参照式から直接セグメンテーションマスクを学習するテキストベースのセマンティックセグメンテーション（TSEG）を提案します。私たちのトランスフォーマーベースの方法は、パッチテキストの類似性を計算し、新しいマルチラベルパッチ割り当てメカニズムを使用してトレーニング中に分類目標を導きます。結果として得られる視覚的接地モデルは、与えられた自然言語表現に対応する画像領域をセグメント化します。私たちのアプローチTSEGは、挑戦的なPhraseCutおよびRefCOCOデータセットでの弱く監視された参照式セグメンテーションの有望な結果を示しています。 TSEGは、Pascal VOCのセマンティックセグメンテーションのゼロショット設定で評価した場合にも、競争力のあるパフォーマンスを示します。

Visual grounding localizes regions (boxes or segments) in the image corresponding to given referring expressions. In this work we address image segmentation from referring expressions, a problem that has so far only been addressed in a fully-supervised setting. A fully-supervised setup, however, requires pixel-wise supervision and is hard to scale given the expense of manual annotation. We therefore introduce a new task of weakly-supervised image segmentation from referring expressions and propose Text grounded semantic SEGgmentation (TSEG) that learns segmentation masks directly from image-level referring expressions without pixel-level annotations. Our transformer-based method computes patch-text similarities and guides the classification objective during training with a new multi-label patch assignment mechanism. The resulting visual grounding model segments image regions corresponding to given natural language expressions. Our approach TSEG demonstrates promising results for weakly-supervised referring expression segmentation on the challenging PhraseCut and RefCOCO datasets. TSEG also shows competitive performance when evaluated in a zero-shot setting for semantic segmentation on Pascal VOC.

updated: Tue May 10 2022 07:52:24 GMT+0000 (UTC)

published: Tue May 10 2022 07:52:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト