Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation

Mennatullah Siam; Rezaul Karim; He Zhao; Richard Wildes

数ショットのビデオセグメンテーション用のマルチスケールメモリコンパレータトランスフォーマ

少数ショットビデオセグメンテーションは、少数のラベル付きサポート画像を使用して、クエリビデオ内の特定の新規クラスを描写するタスクです。一般的なアプローチでは、単一のフィーチャレイヤーへの比較を制限しながらサポートフィーチャとクエリフィーチャを比較するため、潜在的に貴重な情報が無視されます。トランスデコーダ内でスケール間で情報を結合する、少数ショットビデオセグメンテーション用のメタ学習されたマルチスケールメモリコンパレータ (MMC) を紹介します。セグメンテーションタスク用の一般的なマルチスケールトランスフォーマーデコーダーは、スケール間の情報交換を通じて圧縮表現とそのクエリを学習します。以前の研究とは異なり、代わりに、マルチスケールメモリトランスフォーマデコードを介してスケール間情報交換中に詳細な特徴マップを保存し、背景クラスと新規クラスの間の混乱を軽減します。このアプローチに不可欠な要素として、さまざまなタスクの規模を超えた複数の形式の情報交換を調査し、各タスクで使用するための経験的証拠を伴う洞察を提供します。クエリ機能とサポート機能の全体的な比較では、豊富なセマンティクスと正確なローカライゼーションの両方からメリットが得られます。私たちは、主に数ショットのビデオオブジェクトセグメンテーションと、完全に監視された対応物での適応バージョンに関するアプローチを実証します。すべてのケースにおいて、私たちのアプローチはベースラインを上回り、最先端のパフォーマンスをもたらします。私たちのコードは https://github.com/MSiam/MMC-MultiscaleMemory で公開されています。

Few-shot video segmentation is the task of delineating a specific novel class in a query video using few labelled support images. Typical approaches compare support and query features while limiting comparisons to a single feature layer and thereby ignore potentially valuable information. We present a meta-learned Multiscale Memory Comparator (MMC) for few-shot video segmentation that combines information across scales within a transformer decoder. Typical multiscale transformer decoders for segmentation tasks learn a compressed representation, their queries, through information exchange across scales. Unlike previous work, we instead preserve the detailed feature maps during across scale information exchange via a multiscale memory transformer decoding to reduce confusion between the background and novel class. Integral to the approach, we investigate multiple forms of information exchange across scales in different tasks and provide insights with empirical evidence on which to use in each task. The overall comparisons among query and support features benefit from both rich semantics and precise localization. We demonstrate our approach primarily on few-shot video object segmentation and an adapted version on the fully supervised counterpart. In all cases, our approach outperforms the baseline and yields state-of-the-art performance. Our code is publicly available at https://github.com/MSiam/MMC-MultiscaleMemory.

updated: Sat Jul 15 2023 14:21:58 GMT+0000 (UTC)

published: Sat Jul 15 2023 14:21:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト