Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Zhiqi Li; Wenhai Wang; Enze Xie; Zhiding Yu; Anima Anandkumar; Jose M. Alvarez; Tong Lu; Ping Luo

Panoptic SegFormer：トランスフォーマーを使用したPanopticセグメンテーションの詳細

パノプティコンセグメンテーションは、共同セマンティックセグメンテーションとインスタンスセグメンテーションの組み合わせを含み、画像コンテンツは2つのタイプに分けられます。パノプティコンセグメンテーションの一般的なフレームワークであるパノプティコンSegFormerを紹介します。これには、3つの革新的なコンポーネントが含まれています。効率的な監視ありマスクデコーダー、クエリデカップリング戦略、および改善された後処理方法です。また、Deformable DETRを使用して、DETRの高速で効率的なバージョンであるマルチスケール機能を効率的に処理します。具体的には、マスクデコーダーのアテンションモジュールをレイヤーごとに監視します。この深い監視戦略により、アテンションモジュールは意味のあるセマンティック領域にすばやく焦点を合わせることができます。 Deformable DETRと比較して、パフォーマンスが向上し、必要なトレーニングエポックの数が半分になります。私たちのクエリデカップリング戦略は、クエリセットの責任をデカップリングし、物と物の間の相互干渉を回避します。さらに、後処理戦略は、競合するマスクの重複を解決するために分類とセグメンテーションの品質を共同で検討することにより、追加コストなしでパフォーマンスを向上させます。私たちのアプローチは、ベースラインDETRモデルよりも6.2％PQの精度を向上させます。 Panoptic SegFormerは、56.2％のPQでCOCOtest-devで最先端の結果を達成します。また、既存の方法よりも強力なゼロショットの堅牢性を示しています。コードはhttps://github.com/zhiqi-li/Panoptic-SegFormerでリリースされています。

Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved post-processing method. We also use Deformable DETR to efficiently process multi-scale features, which is a fast and efficient version of DETR. Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner. This deep supervision strategy lets the attention modules quickly focus on meaningful semantic regions. It improves performance and reduces the number of required training epochs by half compared to Deformable DETR. Our query decoupling strategy decouples the responsibilities of the query set and avoids mutual interference between things and stuff. In addition, our post-processing strategy improves performance without additional costs by jointly considering classification and segmentation qualities to resolve conflicting mask overlaps. Our approach increases the accuracy 6.2% PQ over the baseline DETR model. Panoptic SegFormer achieves state-of-the-art results on COCO test-dev with 56.2% PQ. It also shows stronger zero-shot robustness over existing methods. The code is released at https://github.com/zhiqi-li/Panoptic-SegFormer.

updated: Fri Dec 03 2021 06:42:46 GMT+0000 (UTC)

published: Wed Sep 08 2021 17:59:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト