TarViS: A Unified Approach for Target-based Video Segmentation

Ali Athar; Alexander Hermans; Jonathon Luiten; Deva Ramanan; Bastian Leibe

TarViS: ターゲットベースのビデオセグメンテーションのための統合アプローチ

ビデオセグメンテーションの一般的な領域は現在、複数のベンチマークにわたるさまざまなタスクに断片化されています。最先端技術の急速な進歩にもかかわらず、現在の方法は圧倒的にタスク固有であり、概念的に他のタスクに一般化することができません。マルチタスク機能を備えた最近のアプローチに触発されて、私たちは TarViS を提案します。これは、ビデオ内で任意に定義された一連の「ターゲット」をセグメント化する必要があるあらゆるタスクに適用できる、新しい統合ネットワークアーキテクチャです。私たちのアプローチは、後者を抽象的な「クエリ」としてモデル化し、ピクセル精度のターゲットマスクを予測するために使用されるため、タスクがこれらのターゲットを定義する方法に関して柔軟です。単一の TarViS モデルは、さまざまなタスクにわたるデータセットのコレクションで共同でトレーニングでき、タスク固有の再トレーニングを行わずに、推論中にタスク間でホットスワップできます。その有効性を実証するために、TarViS を 4 つの異なるタスク、つまりビデオインスタンスセグメンテーション (VIS)、ビデオパノプティックセグメンテーション (VPS)、ビデオオブジェクトセグメンテーション (VOS)、およびポイント模範誘導追跡 (PET) に適用します。統合され、共同トレーニングされたモデルは、これら 4 つのタスクにわたる 5/7 ベンチマークで最先端のパフォーマンスを達成し、残りの 2 つで競争力のあるパフォーマンスを達成します。コードとモデルの重みは、https://github.com/Ali2500/TarViS で入手できます。

The general domain of video segmentation is currently fragmented into different tasks spanning multiple benchmarks. Despite rapid progress in the state-of-the-art, current methods are overwhelmingly task-specific and cannot conceptually generalize to other tasks. Inspired by recent approaches with multi-task capability, we propose TarViS: a novel, unified network architecture that can be applied to any task that requires segmenting a set of arbitrarily defined 'targets' in video. Our approach is flexible with respect to how tasks define these targets, since it models the latter as abstract 'queries' which are then used to predict pixel-precise target masks. A single TarViS model can be trained jointly on a collection of datasets spanning different tasks, and can hot-swap between tasks during inference without any task-specific retraining. To demonstrate its effectiveness, we apply TarViS to four different tasks, namely Video Instance Segmentation (VIS), Video Panoptic Segmentation (VPS), Video Object Segmentation (VOS) and Point Exemplar-guided Tracking (PET). Our unified, jointly trained model achieves state-of-the-art performance on 5/7 benchmarks spanning these four tasks, and competitive performance on the remaining two. Code and model weights are available at: https://github.com/Ali2500/TarViS

updated: Wed May 10 2023 16:40:04 GMT+0000 (UTC)

published: Fri Jan 06 2023 18:59:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト