Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation

Balamurali Murugesan; Rukhshanda Hussain; Rajarshi Bhattacharya; Ismail Ben Ayed; Jose Dolz

プロンプトクラス: 弱い教師付きセマンティックセグメンテーションにおけるプロンプトクラス学習の力の探求

最近、CLIP ベースのアプローチは、言語と視覚の対照的な事前トレーニングの力を利用して、一般化および少数ショット学習タスクで顕著なパフォーマンスを示しています。特に、タスク関連のテキストトークンを使用して、事前トレーニングされた言語視覚モデルを下流のタスクに適応させるための効果的な戦略として、即時調整が浮上しています。この進歩を動機として、この研究では、弱教師セマンティックセグメンテーション (WSSS) などの他の基本的な問題がプロンプトチューニングから恩恵を受けることができるかどうかを検討します。私たちの調査結果から、WSSS に対するプロンプトチューニングの影響を明らかにする 2 つの興味深い観察結果が明らかになりました。まず、テキストプロンプトのクラストークンのみを変更すると、おそらくコンテキストを最適化するより複雑な戦略と比較して、クラスアクティベーションマップ (CAM) への影響が大きくなります。そして第 2 に、画像のグラウンドトゥルースに関連付けられたクラストークンが、最良の CAM を生成するカテゴリに必ずしも対応するとは限りません。これらの観察に動機付けられて、PrOmpt クラス学習 (POLE) 戦略に基づく新しいアプローチを紹介します。広範な実験を通じて、私たちのシンプルでありながら効率的なアプローチが、よく知られた WSSS ベンチマークで SOTA パフォーマンスを達成できることを実証しました。これらの結果は、WSSS の言語視覚モデルの利点だけでなく、この問題に対する迅速な学習の可能性も強調しています。コードは https://github.com/rB080/WSS_POLE で入手できます。

Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.

updated: Sat Jan 13 2024 18:23:07 GMT+0000 (UTC)

published: Fri Jun 30 2023 19:25:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト