Hybrid-S2S: Video Object Segmentation with Recurrent Networks and Correspondence Matching

Fatemeh Azimi; Stanislav Frolov; Federico Raue; Joern Hees; Andreas Dengel

Hybrid-S2S：リカレントネットワークとコレスポンデンスマッチングによるビデオオブジェクトセグメンテーション

ワンショットビデオオブジェクトセグメンテーション〜（VOS）は、ビデオシーケンス内の対象オブジェクトをピクセル単位で追跡するタスクであり、最初のフレームのセグメンテーションマスクが推論時に与えられます。近年、リカレントニューラルネットワーク〜（RNN）はVOSタスクに広く使用されていますが、ドリフトやエラー伝播などの制限に悩まされることがよくあります。この作業では、RNNベースのアーキテクチャを研究し、対応マッチングから取得した情報を組み込むことができるデュアルマスク伝播戦略を利用して、HS2Sという名前のハイブリッドシーケンス間アーキテクチャを提案することにより、これらの問題のいくつかに対処します。私たちの実験は、対応マッチングでRNNを拡張することが、ドリフトの問題を減らすための非常に効果的なソリューションであることを示しています。追加情報は、モデルがより正確なマスクを予測するのに役立ち、エラーの伝播に対して堅牢になります。 DAVIS2017データセットとYoutube-VOSでHS2Sモデルを評価します。後者では、VOSのRNNベースの最先端の方法に比べて全体的なセグメンテーション精度が11.2pp向上しています。オクルージョンや長いシーケンスなどの困難なケースでのモデルの動作を分析し、ハイブリッドアーキテクチャがこれらの困難なシナリオでセグメンテーションの品質を大幅に向上させることを示します。

One-shot Video Object Segmentation~(VOS) is the task of pixel-wise tracking an object of interest within a video sequence, where the segmentation mask of the first frame is given at inference time. In recent years, Recurrent Neural Networks~(RNNs) have been widely used for VOS tasks, but they often suffer from limitations such as drift and error propagation. In this work, we study an RNN-based architecture and address some of these issues by proposing a hybrid sequence-to-sequence architecture named HS2S, utilizing a dual mask propagation strategy that allows incorporating the information obtained from correspondence matching. Our experiments show that augmenting the RNN with correspondence matching is a highly effective solution to reduce the drift problem. The additional information helps the model to predict more accurate masks and makes it robust against error propagation. We evaluate our HS2S model on the DAVIS2017 dataset as well as Youtube-VOS. On the latter, we achieve an improvement of 11.2pp in the overall segmentation accuracy over RNN-based state-of-the-art methods in VOS. We analyze our model's behavior in challenging cases such as occlusion and long sequences and show that our hybrid architecture significantly enhances the segmentation quality in these difficult scenarios.

updated: Sat Nov 07 2020 09:33:51 GMT+0000 (UTC)

published: Sat Oct 10 2020 19:00:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト