TrickVOS: A Bag of Tricks for Video Object Segmentation

Evangelos Skartados; Konstantinos Georgiadis; Mehmet Kerim Yucel; Koskinas Ioannis; Armando Domi; Anastasios Drosou; Bruno Manganelli; Albert Sa`a-Garriga

TrickVOS: ビデオオブジェクトセグメンテーションのためのトリック集

時空間メモリ (STM) ネットワーク方式は、その優れたパフォーマンスにより、半教師ありビデオオブジェクトセグメンテーション (SVOS) において主流となってきました。この研究では、そのような方法を改善できる 3 つの重要な側面を特定します。 i) 監視信号、ii) 事前トレーニング、および iii) 空間認識。次に、TrickVOS を提案します。 i) 構造を意識したハイブリッド損失、ii) シンプルなデコーダ事前トレーニング方式、および iii) モデル予測に空間的制約を課す安価なトラッカーによって各側面に対処する、一般的でメソッドに依存しないトリックのバッグです。最後に、軽量ネットワークを提案し、TrickVOS でトレーニングすると、実際に実行できる最初の STM ベースの SVOS 手法の 1 つでありながら、DAVIS および YouTube ベンチマークで最先端の手法と同等の結果を達成できることを示します。 -モバイルデバイスでの時間。

Space-time memory (STM) network methods have been dominant in semi-supervised video object segmentation (SVOS) due to their remarkable performance. In this work, we identify three key aspects where we can improve such methods; i) supervisory signal, ii) pretraining and iii) spatial awareness. We then propose TrickVOS; a generic, method-agnostic bag of tricks addressing each aspect with i) a structure-aware hybrid loss, ii) a simple decoder pretraining regime and iii) a cheap tracker that imposes spatial constraints in model predictions. Finally, we propose a lightweight network and show that when trained with TrickVOS, it achieves competitive results to state-of-the-art methods on DAVIS and YouTube benchmarks, while being one of the first STM-based SVOS methods that can run in real-time on a mobile device.

updated: Tue Jun 27 2023 10:54:08 GMT+0000 (UTC)

published: Tue Jun 27 2023 10:54:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト