DFormer: Diffusion-guided Transformer for Universal Image Segmentation

Hefeng Wang; Jiale Cao; Rao Muhammad Anwer; Jin Xie; Fahad Shahbaz Khan; Yanwei Pang

DFormer: ユニバーサル画像セグメンテーション用の拡散誘導トランスフォーマー

このペーパーでは、ユニバーサル画像セグメンテーションのための DFormer という名前のアプローチを紹介します。提案された DFormer は、ユニバーサル画像セグメンテーションタスクを拡散モデルを使用したノイズ除去プロセスと見なします。 DFormer は、まずさまざまなレベルのガウスノイズをグラウンドトゥルースマスクに追加し、次に破損したマスクからノイズ除去マスクを予測するモデルを学習します。具体的には、ノイズの多いマスクとともに深いピクセルレベルの特徴を入力として取得し、マスク特徴とアテンションマスクを生成し、拡散ベースのデコーダを使用してマスク予測を段階的に実行します。推論時に、DFormer はランダムに生成されたマスクのセットからマスクと対応するカテゴリを直接予測します。広範な実験により、さまざまな画像セグメンテーションタスク (パノプティックセグメンテーション、インスタンスセグメンテーション、セマンティックセグメンテーション) に対する私たちの提案した貢献のメリットが明らかになりました。当社の DFormer は、最近の拡散ベースのパノプティックセグメンテーション手法 Pix2Seq-D を上回り、MS COCO val2017 セットで 3.6% のゲインを実現します。さらに、DFormer は、ADE20K val set で最近の拡散ベースの方法を 2.2% 上回る、有望なセマンティックセグメンテーションパフォーマンスを実現します。私たちのソースコードとモデルは https://github.com/cp3wan/DFormer で公開されます。

This paper introduces an approach, named DFormer, for universal image segmentation. The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model. DFormer first adds various levels of Gaussian noise to ground-truth masks, and then learns a model to predict denoising masks from corrupted masks. Specifically, we take deep pixel-level features along with the noisy masks as inputs to generate mask features and attention masks, employing diffusion-based decoder to perform mask prediction gradually. At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks. Extensive experiments reveal the merits of our proposed contributions on different image segmentation tasks: panoptic segmentation, instance segmentation, and semantic segmentation. Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val2017 set. Further, DFormer achieves promising semantic segmentation performance outperforming the recent diffusion-based method by 2.2% on ADE20K val set. Our source code and models will be publicly on https://github.com/cp3wan/DFormer

updated: Tue Jun 06 2023 06:33:32 GMT+0000 (UTC)

published: Tue Jun 06 2023 06:33:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト