Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

Xiangtai Li; Shilin Xu; Yibo Yang; Guangliang Cheng; Yunhai Tong; Dacheng Tao

Panoptic-PartFormer：Panopticパーツセグメンテーションの統合モデルの学習

パノプティコンパーツセグメンテーション（PPS）は、パノプティコンセグメンテーションとパーツセグメンテーションを1つのタスクに統合することを目的としています。以前の作業では、主に分離されたアプローチを使用して、共有の計算やタスクの関連付けを実行せずに、物、物、部品の予測を個別に処理していました。この作業では、これらのタスクをアーキテクチャレベルで統合し、Panoptic-PartFormerという名前の最初のエンドツーエンドの統合メソッドを設計することを目指しています。特に、Vision Transformerの最近の進歩に動機付けられて、オブジェクトクエリとして物、もの、およびパーツをモデル化し、統一されたマスク予測および分類問題として3つの予測すべてを最適化することを直接学習します。パーツ機能とモノ/スタッフ機能をそれぞれ生成するために、分離デコーダーを設計します。次に、すべてのクエリと対応する機能を利用して、推論を共同で繰り返し実行することを提案します。最終的なマスクは、クエリと対応する機能の間の内積を介して取得できます。広範なアブレーション研究と分析は、私たちのフレームワークの有効性を証明しています。 Panoptic-PartFormerは、CityscapesPPSとPascalContext PPSの両方のデータセットで、少なくとも70％のGFlopsと50％のパラメーターの減少により、新しい最先端の結果を実現します。特に、ResNet50バックボーンで3.4％の相対的な改善が得られ、Pascal ContextPPSデータセットでSwinTransformerを採用した後に10％の改善が得られます。私たちの知る限り、私たちは\textit{統一されたエンドツーエンドのトランスフォーマーモデルを介してPPS問題を解決する最初の企業です。その有効性と概念の単純さを考えると、Panoptic-PartFormerが優れたベースラインとして機能し、PPSの将来の統一された研究に役立つことを願っています。コードとモデルはhttps://github.com/lxtGH/Panoptic-PartFormerで入手できます。

Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and part segmentation into one task. Previous work mainly utilizes separated approaches to handle thing, stuff, and part predictions individually without performing any shared computation and task association. In this work, we aim to unify these tasks at the architectural level, designing the first end-to-end unified method named Panoptic-PartFormer. In particular, motivated by the recent progress in Vision Transformer, we model things, stuff, and part as object queries and directly learn to optimize the all three predictions as unified mask prediction and classification problem. We design a decoupled decoder to generate part feature and thing/stuff feature respectively. Then we propose to utilize all the queries and corresponding features to perform reasoning jointly and iteratively. The final mask can be obtained via inner product between queries and the corresponding features. The extensive ablation studies and analysis prove the effectiveness of our framework. Our Panoptic-PartFormer achieves the new state-of-the-art results on both Cityscapes PPS and Pascal Context PPS datasets with at least 70% GFlops and 50% parameters decrease. In particular, we get 3.4% relative improvements with ResNet50 backbone and 10% improvements after adopting Swin Transformer on Pascal Context PPS dataset. To the best of our knowledge, we are the first to solve the PPS problem via \textit{a unified and end-to-end transformer model. Given its effectiveness and conceptual simplicity, we hope our Panoptic-PartFormer can serve as a good baseline and aid future unified research for PPS. Our code and models are available at https://github.com/lxtGH/Panoptic-PartFormer.

updated: Sun Jul 10 2022 09:30:39 GMT+0000 (UTC)

published: Sun Apr 10 2022 11:16:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト