Guided Slot Attention for Unsupervised Video Object Segmentation

Minhyeok Lee; Suhwan Cho; Dogyoon Lee; Chaewon Park; Jungho Lee; Sangyoun Lee

教師なしビデオオブジェクトセグメンテーションのためのガイド付きスロットアテンション

教師なしビデオオブジェクトセグメンテーションは、ビデオシーケンスで最も目立つオブジェクトをセグメント化することを目的としています。ただし、複雑な背景と複数の前景オブジェクトの存在により、この作業は困難になります。この問題に対処するために、ガイド付きスロットアテンションネットワークを提案して、空間構造情報を強化し、前景と背景の分離を改善します。クエリガイダンスで初期化されるフォアグラウンドスロットとバックグラウンドスロットは、テンプレート情報との相互作用に基づいて繰り返し調整されます。さらに、スロットとテンプレートの相互作用を改善し、ターゲットフレームと参照フレームのグローバルフィーチャとローカルフィーチャを効果的に融合するために、K 最近傍フィルタリングとフィーチャアグリゲーショントランスフォーマが導入されています。提案されたモデルは、2 つの一般的なデータセットで最先端のパフォーマンスを実現します。さらに、さまざまな比較実験を通じて、困難なシーンでの提案されたモデルの堅牢性を示します。

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation. The foreground and background slots, which are initialized with query guidance, are iteratively refined based on interactions with template information. Furthermore, to improve slot--template interaction and effectively fuse global and local features in the target and reference frames, K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally, we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

updated: Wed Mar 15 2023 02:08:20 GMT+0000 (UTC)

published: Wed Mar 15 2023 02:08:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト