Video Instance Segmentation with a Propose-Reduce Paradigm

Huaijia Lin; Ruizheng Wu; Shu Liu; Jiangbo Lu; Jiaya Jia

提案削減パラダイムによるビデオインスタンスのセグメンテーション

ビデオインスタンスセグメンテーション（VIS）は、ビデオの各フレームの事前定義されたクラスのすべてのインスタンスをセグメント化して関連付けることを目的としています。以前の方法では、通常、最初にフレームまたはクリップのセグメンテーションを取得してから、追跡または照合によって不完全な結果をマージします。これらの方法では、マージステップでエラーが蓄積される可能性があります。逆に、入力ビデオの完全なシーケンスを1つのステップで生成する、新しいパラダイムであるPropose-Reduceを提案します。さらに、長期的な伝播のために、既存の画像レベルのインスタンスセグメンテーションネットワーク上にシーケンス伝播ヘッドを構築します。提案されたフレームワークの堅牢性と高い再現率を確保するために、同じインスタンスの冗長シーケンスが削減された複数のシーケンスが提案されます。 2つの代表的なベンチマークデータセットで最先端のパフォーマンスを達成しています。YouTube-VIS検証セットのAPで47.6％、DAVIS-UVOS検証セットのJ＆Fで70.4％を取得しています。

Video instance segmentation (VIS) aims to segment and associate all instances of predefined classes for each frame in videos. Prior methods usually obtain segmentation for a frame or clip first, and then merge the incomplete results by tracking or matching. These methods may cause error accumulation in the merging step. Contrarily, we propose a new paradigm -- Propose-Reduce, to generate complete sequences for input videos by a single step. We further build a sequence propagation head on the existing image-level instance segmentation network for long-term propagation. To ensure robustness and high recall of our proposed framework, multiple sequences are proposed where redundant sequences of the same instance are reduced. We achieve state-of-the-art performance on two representative benchmark datasets -- we obtain 47.6% in terms of AP on YouTube-VIS validation set and 70.4% for J&F on DAVIS-UVOS validation set.

updated: Thu Mar 25 2021 10:58:36 GMT+0000 (UTC)

published: Thu Mar 25 2021 10:58:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト