Efficient Video Object Segmentation with Compressed Video

Kai Xu; Angela Yao

圧縮ビデオによる効率的なビデオオブジェクトセグメンテーション

ビデオの時間的冗長性を活用することにより、半教師ありビデオオブジェクトセグメンテーションのための効率的な推論フレームワークを提案します。私たちの方法は、選択されたキーフレームで推論を実行し、圧縮されたビデオビットストリームからの動きベクトルと残差に基づいて伝播を介して他のフレームの予測を行います。具体的には、マルチリファレンス方式でキーフレームから他のフレームにセグメンテーションマスクを伝播するための新しい動きベクトルベースのワーピング方法を提案します。さらに、ブロックごとに伝播されたセグメンテーションマスクを修正して詳細を追加できる残差ベースのリファインメントモジュールを提案します。私たちのアプローチは柔軟性があり、既存のビデオオブジェクトセグメンテーションアルゴリズムに追加することができます。ベースモデルとしてtop-kフィルタリングを使用したSTMを使用して、DAVIS16とYouTube-VOSで非常に競争力のある結果を達成し、精度をほとんど損なうことなく最大4.9倍の大幅な高速化を実現しました。

We propose an efficient inference framework for semi-supervised video object segmentation by exploiting the temporal redundancy of the video. Our method performs inference on selected keyframes and makes predictions for other frames via propagation based on motion vectors and residuals from the compressed video bitstream. Specifically, we propose a new motion vector-based warping method for propagating segmentation masks from keyframes to other frames in a multi-reference manner. Additionally, we propose a residual-based refinement module that can correct and add detail to the block-wise propagated segmentation masks. Our approach is flexible and can be added on top of existing video object segmentation algorithms. With STM with top-k filtering as our base model, we achieved highly competitive results on DAVIS16 and YouTube-VOS with substantial speedups of up to 4.9X with little loss in accuracy.

updated: Mon Jul 26 2021 12:57:04 GMT+0000 (UTC)

published: Mon Jul 26 2021 12:57:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト