SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation

Dongfang Liu; Yiming Cui; Wenbo Tan; Yingjie Chen

SG-Net：1段階のビデオインスタンスセグメンテーションのための空間粒度ネットワーク

ビデオインスタンスセグメンテーション（VIS）は、コンピュータービジョンにおける新しく重要なタスクです。これまで、最高のVISメソッドは、追跡ブランチを追加することで2段階のマスクR-CNNを拡張し、改善の余地を十分に残しています。対照的に、私たちは新しい視点からVISタスクに取り組み、1段階の空間粒度ネットワーク（SG-Net）を提案します。従来の2段階の方法と比較して、SG-Netには4つの利点があります。1）私たちの方法は1段階のコンパクトなアーキテクチャであり、各タスクヘッド（検出、セグメンテーション、追跡）は相互に依存して作成されているため、機能を効果的に共有して楽しむことができます。共同最適化; 2）マスク予測は、検出された各インスタンスのサブ領域で動的に実行されるため、高品質の細粒度のマスクが得られます。 3）各タスク予測は、高価なプロポーザルベースのRoI機能の使用を回避し、インスタンスあたりの実行時の複雑さを大幅に軽減します。 4）当社のトラッキングヘッドは、トラッキングのためにオブジェクトの中心性の動きをモデル化します。これにより、さまざまなオブジェクトの外観に対するトラッキングの堅牢性が効果的に強化されます。評価では、YouTube-VISデータセットで最先端の比較を示します。広範な実験により、コンパクトな1ステージ方式により、精度と推論速度の両方でパフォーマンスを向上できることが実証されています。 SG-NetがVISタスクの強力で柔軟なベースラインとして機能することを願っています。私たちのコードが利用可能になります。

Video instance segmentation (VIS) is a new and critical task in computer vision. To date, top-performing VIS methods extend the two-stage Mask R-CNN by adding a tracking branch, leaving plenty of room for improvement. In contrast, we approach the VIS task from a new perspective and propose a one-stage spatial granularity network (SG-Net). Compared to the conventional two-stage methods, SG-Net demonstrates four advantages: 1) Our method has a one-stage compact architecture and each task head (detection, segmentation, and tracking) is crafted interdependently so they can effectively share features and enjoy the joint optimization; 2) Our mask prediction is dynamically performed on the sub-regions of each detected instance, leading to high-quality masks of fine granularity; 3) Each of our task predictions avoids using expensive proposal-based RoI features, resulting in much reduced runtime complexity per instance; 4) Our tracking head models objects centerness movements for tracking, which effectively enhances the tracking robustness to different object appearances. In evaluation, we present state-of-the-art comparisons on the YouTube-VIS dataset. Extensive experiments demonstrate that our compact one-stage method can achieve improved performance in both accuracy and inference speed. We hope our SG-Net could serve as a strong and flexible baseline for the VIS task. Our code will be available.

updated: Thu Mar 18 2021 14:31:15 GMT+0000 (UTC)

published: Thu Mar 18 2021 14:31:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト