Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho; Minhyeok Lee; Seunghoon Lee; Dogyoon Lee; Sangyoun Lee

教師なしビデオオブジェクトセグメンテーションのためのデュアルプロトタイプ Attention

教師なしビデオオブジェクトセグメンテーション (VOS) は、ビデオ内の最も顕著なオブジェクトを検出してセグメント化することを目的としています。教師なし VOS で使用される主な手法は、1) 外観とモーション情報のコラボレーション、および 2) 異なるフレーム間の時間的融合です。この論文では、2つの新しいプロトタイプベースの注意メカニズム、モダリティ間注意（IMA）とフレーム間注意（IFA）を提案し、異なるモダリティとフレームにわたる密な伝播を介してこれらの技術を組み込みます。 IMA は、相互の改良に基づいて、さまざまなモダリティからのコンテキスト情報を密に統合します。 IFA は、ビデオのグローバルコンテキストをクエリフレームに挿入し、複数のフレームから有用なプロパティを最大限に活用できるようにします。公開ベンチマークデータセットに関する実験結果は、提案されたアプローチが既存のすべての方法よりも大幅に優れていることを示しています。提案された 2 つのコンポーネントは、アブレーション研究によっても完全に検証されます。

Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos. The primary techniques used in unsupervised VOS are 1) the collaboration of appearance and motion information and 2) temporal fusion between different frames. This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames. IMA densely integrates context information from different modalities based on a mutual refinement. IFA injects global context of a video to the query frame, enabling a full utilization of useful properties from multiple frames. Experimental results on public benchmark datasets demonstrate that our proposed approach outperforms all existing methods by a substantial margin. The proposed two components are also thoroughly validated via ablative study.

updated: Wed Mar 15 2023 07:11:13 GMT+0000 (UTC)

published: Tue Nov 22 2022 06:19:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト