3rd Place Solution for PVUW2023 VSS Track: A Large Model for Semantic Segmentation on VSPW

Shijie Chang; Zeqi Hao; Ben Kang; Xiaoqi Zhao; Jiawen Zhu; Zhenyu Chen; Lihe Zhang; Lu Zhang; Huchuan Lu

PVUW2023 VSS トラックの 3 位ソリューション: VSPW のセマンティックセグメンテーション用の大規模モデル

本稿では、PVUW2023 VSS トラックの 3 位ソリューションを紹介します。セマンティックセグメンテーションは、現実世界の多数のアプリケーションにおけるコンピュータビジョンの基本的なタスクです。私たちは、ビデオのセマンティックセグメンテーションの問題に取り組むために、さまざまな画像レベルのビジュアルバックボーンとセグメンテーションヘッドを調査してきました。実験の結果、バックボーンとして InternImage-H を使用し、セグメンテーションヘッドとして Mask2former を使用することで最高のパフォーマンスが得られることがわかりました。さらに、CascadePSP と Segment Anything Model (SAM) という 2 つの後処理方法を検討します。最終的に、私たちのアプローチは、VSPW テストセット 1 と最終テストセットでそれぞれ 62.60% と 64.84% の mIoU を獲得し、PVUW2023 VSS トラックで 3 位を確保しました。

In this paper, we introduce 3rd place solution for PVUW2023 VSS track. Semantic segmentation is a fundamental task in computer vision with numerous real-world applications. We have explored various image-level visual backbones and segmentation heads to tackle the problem of video semantic segmentation. Through our experimentation, we find that InternImage-H as the backbone and Mask2former as the segmentation head achieves the best performance. In addition, we explore two post-precessing methods: CascadePSP and Segment Anything Model (SAM). Ultimately, our approach obtains 62.60% and 64.84% mIoU on the VSPW test set1 and final test set, respectively, securing the third position in the PVUW2023 VSS track.

updated: Tue Jun 06 2023 01:49:09 GMT+0000 (UTC)

published: Sun Jun 04 2023 07:50:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト