Holistic Prototype Attention Network for Few-Shot VOS

Yin Tang; Tao Chen; Xiruo Jiang; Yazhou Yao; Guo-Sen Xie; Heng-Tao Shen

少数ショット VOS 向けの総合的なプロトタイプアテンションネットワーク

フューショットビデオオブジェクトセグメンテーション (FSVOS) は、ピクセルレベルのオブジェクトアノテーションを含む少数のサポートイメージセットに頼ることによって、目に見えないクラスの動的オブジェクトをセグメント化することを目的としています。既存の方法は、サポート画像とクエリフレーム間の相関を学習することにより、ドメインエージェントベースのアテンションメカニズムがFSVOSで効果的であることを実証しました。ただし、エージェントフレームには冗長なピクセル情報と背景ノイズが含まれているため、セグメンテーションのパフォーマンスが低下します。さらに、既存の方法は、クエリビデオのフレーム間の相関を無視する傾向があります。上記のジレンマを軽減するために、FSVOS を進化させるための総合的なプロトタイプアテンションネットワーク (HPAN) を提案します。具体的には、HPAN はプロトタイプグラフアテンションモジュール (PGAM) と双方向プロトタイプアテンションモジュール (BPAM) を導入し、有益な知識を目に見えるクラスから目に見えないクラスに転送します。 PGAM は、すべての前景フィーチャからローカルプロトタイプを生成し、それらの内部相関を利用して全体的なプロトタイプの表現を強化します。 BPAM は、サポートクエリのセマンティックな一貫性と内部フレームの時間的な一貫性を実現するために、同時注意と自己注意を融合することにより、サポート画像とビデオフレームからの全体的な情報を活用します。私たちが提案する HPAN 手法の有効性と優位性を実証するために、YouTube-FSVOS に関する広範な実験が提供されています。

Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images that contain pixel-level object annotations. Existing methods have demonstrated that the domain agent-based attention mechanism is effective in FSVOS by learning the correlation between support images and query frames. However, the agent frame contains redundant pixel information and background noise, resulting in inferior segmentation performance. Moreover, existing methods tend to ignore inter-frame correlations in query videos. To alleviate the above dilemma, we propose a holistic prototype attention network (HPAN) for advancing FSVOS. Specifically, HPAN introduces a prototype graph attention module (PGAM) and a bidirectional prototype attention module (BPAM), transferring informative knowledge from seen to unseen classes. PGAM generates local prototypes from all foreground features and then utilizes their internal correlations to enhance the representation of the holistic prototypes. BPAM exploits the holistic information from support images and video frames by fusing co-attention and self-attention to achieve support-query semantic consistency and inner-frame temporal consistency. Extensive experiments on YouTube-FSVOS have been provided to demonstrate the effectiveness and superiority of our proposed HPAN method.

updated: Sun Jul 16 2023 03:48:57 GMT+0000 (UTC)

published: Sun Jul 16 2023 03:48:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト