An End-to-End Trainable Video Panoptic Segmentation Method usingTransformers

Jeongwon Ryu; Kwangjin Yoon

トランスフォーマーを使用したエンドツーエンドのトレーニング可能なビデオパノプティックセグメンテーション方法

この論文では、新たに出現した研究分野であるビデオパノラマセグメンテーション問題に取り組むためのアルゴリズムを提示します。ビデオパノプティコンセグメンテーションは、パノプティコンセグメンテーションとマルチオブジェクトトラッキングの一般的なタスクを統合するタスクです。言い換えると、ビデオシーケンス全体のパノラマセグメンテーション結果とともにインスタンストラッキングIDを生成する必要があります。私たちが提案するビデオパノラマセグメンテーションアルゴリズムはトランスフォーマーを使用し、複数のビデオフレームの入力を使用してエンドツーエンドでトレーニングできます。 STEPデータセットでメソッドをテストし、最近提案されたSTQメトリックを使用してそのパフォーマンスを報告します。このメソッドは、KITTI-STEPデータセットで57.81％、MOTChallenge-STEPデータセットで31.8％をアーカイブしました。

In this paper, we present an algorithm to tackle a video panoptic segmentation problem, a newly emerging area of research. The video panoptic segmentation is a task that unifies the typical task of panoptic segmentation and multi-object tracking. In other words, it requires generating the instance tracking IDs along with panoptic segmentation results across video sequences. Our proposed video panoptic segmentation algorithm uses the transformer and it can be trained in end-to-end with an input of multiple video frames. We test our method on the STEP dataset and report its performance with recently proposed STQ metric. The method archived 57.81% on the KITTI-STEP dataset and 31.8% on the MOTChallenge-STEP dataset.

updated: Fri Oct 08 2021 10:13:37 GMT+0000 (UTC)

published: Fri Oct 08 2021 10:13:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト