Treating Motion as Option to Reduce Motion Dependency in Unsupervised Video Object Segmentation

Suhwan Cho; Minhyeok Lee; Seunghoon Lee; Chaewon Park; Donghyeong Kim; Sangyoun Lee

教師なしビデオオブジェクトセグメンテーションにおけるモーション依存性を減らすためのオプションとしてモーションを扱う

教師なしビデオオブジェクトセグメンテーション (VOS) は、ピクセルレベルでビデオシーケンス内の最も顕著なオブジェクトを検出することを目的としています。教師なし VOS では、ほとんどの最先端の方法が、外観キューに加えてオプティカルフローマップから取得したモーションキューを活用して、顕著なオブジェクトが通常背景と比較して独特の動きをするという特性を利用します。ただし、場合によっては信頼できないモーションキューに過度に依存するため、安定した予測を達成することはできません。既存の 2 ストリーム VOS メソッドのこのモーション依存性を減らすために、オプションでモーションキューを利用する新しいオプションとしてのモーションネットワークを提案します。さらに、モーションが必ずしも必要ではないという提案されたネットワークの特性を十分に活用するために、共同ネットワーク学習戦略を導入します。すべての公開ベンチマークデータセットで、提案されたネットワークは、リアルタイムの推論速度で最先端のパフォーマンスを提供します。

Unsupervised video object segmentation (VOS) aims to detect the most salient object in a video sequence at the pixel level. In unsupervised VOS, most state-of-the-art methods leverage motion cues obtained from optical flow maps in addition to appearance cues to exploit the property that salient objects usually have distinctive movements compared to the background. However, as they are overly dependent on motion cues, which may be unreliable in some cases, they cannot achieve stable prediction. To reduce this motion dependency of existing two-stream VOS methods, we propose a novel motion-as-option network that optionally utilizes motion cues. Additionally, to fully exploit the property of the proposed network that motion is not always required, we introduce a collaborative network learning strategy. On all the public benchmark datasets, our proposed network affords state-of-the-art performance with real-time inference speed.

updated: Wed Oct 19 2022 07:48:26 GMT+0000 (UTC)

published: Sun Sep 04 2022 18:05:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト