Recurrent Dynamic Embedding for Video Object Segmentation

Mingxing Li; Li Hu; Zhiwei Xiong; Bang Zhang; Pan Pan; Dong Liu

ビデオオブジェクトセグメンテーションのための反復動的埋め込み

時空間メモリ（STM）ベースのビデオオブジェクトセグメンテーション（VOS）ネットワークは通常、数フレームごとにメモリバンクを増やし続け、優れたパフォーマンスを示します。ただし、1）ビデオの長さが長くなるにつれて、ハードウェアは増え続けるメモリ要件に耐えることができません。 2）大量の情報を保存すると、必然的に大量のノイズが発生します。これは、メモリバンクから最も重要な情報を読み取るのに役立ちません。本論文では、一定サイズのメモリバンクを構築するためのRecurrent Dynamic Embedding（RDE）を提案します。具体的には、提案された時空間集約モジュール（SAM）によってRDEを明示的に生成および更新します。これは、履歴情報の手がかりを利用します。 SAMの繰り返し使用によるエラーの蓄積を回避するために、トレーニング段階での偏りのないガイダンス損失を提案します。これにより、長いビデオでSAMがより堅牢になります。さらに、メモリバンク内の予測されたマスクは、クエリフレームのセグメンテーションに影響を与える不正確なネットワーク推論のために、不正確です。この問題に対処するために、ネットワークがメモリバンク内のさまざまな品質のマスクの埋め込みを修復できるように、新しい自己修正戦略を設計します。広範な実験は、私たちの方法がパフォーマンスと速度の間の最良のトレードオフを達成することを示しています。コードはhttps://github.com/Limingxing00/RDE-VOS-CVPR2022で入手できます。

Space-time memory (STM) based video object segmentation (VOS) networks usually keep increasing memory bank every several frames, which shows excellent performance. However, 1) the hardware cannot withstand the ever-increasing memory requirements as the video length increases. 2) Storing lots of information inevitably introduces lots of noise, which is not conducive to reading the most important information from the memory bank. In this paper, we propose a Recurrent Dynamic Embedding (RDE) to build a memory bank of constant size. Specifically, we explicitly generate and update RDE by the proposed Spatio-temporal Aggregation Module (SAM), which exploits the cue of historical information. To avoid error accumulation owing to the recurrent usage of SAM, we propose an unbiased guidance loss during the training stage, which makes SAM more robust in long videos. Moreover, the predicted masks in the memory bank are inaccurate due to the inaccurate network inference, which affects the segmentation of the query frame. To address this problem, we design a novel self-correction strategy so that the network can repair the embeddings of masks with different qualities in the memory bank. Extensive experiments show our method achieves the best tradeoff between performance and speed. Code is available at https://github.com/Limingxing00/RDE-VOS-CVPR2022.

updated: Sun May 08 2022 02:24:43 GMT+0000 (UTC)

published: Sun May 08 2022 02:24:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト