Temporally stable video segmentation without video annotations

Aharon Azulay; Tavi Halperin; Orestis Vantzos; Nadav Borenstein; Ofir Bibi

ビデオ注釈のない一時的に安定したビデオセグメンテーション

一時的に一貫性のある高密度のビデオアノテーションは少なく、収集するのが困難です。対照的に、画像セグメンテーションデータセット（および事前にトレーニングされたモデル）はどこにでもあり、新しいタスクのラベル付けが簡単です。この論文では、オプティカルフローベースの一貫性測定を使用して、教師なしの方法で静止画像セグメンテーションモデルをビデオに適応させる方法を紹介します。推測されたセグメント化されたビデオが実際により安定しているように見えることを保証するために、一貫性の尺度がユーザー調査を通じて人間の判断と十分に相関していることを確認します。この測定値を損失として使用して新しい多入力多出力デコーダーをトレーニングし、現在の画像セグメンテーションデータセットと時間加重ガイドフィルターを改良する手法とともに、精度の損失を最小限に抑えながら、生成されたセグメント化されたビデオの安定性の向上を観察します。

Temporally consistent dense video annotations are scarce and hard to collect. In contrast, image segmentation datasets (and pre-trained models) are ubiquitous, and easier to label for any novel task. In this paper, we introduce a method to adapt still image segmentation models to video in an unsupervised manner, by using an optical flow-based consistency measure. To ensure that the inferred segmented videos appear more stable in practice, we verify that the consistency measure is well correlated with human judgement via a user study. Training a new multi-input multi-output decoder using this measure as a loss, together with a technique for refining current image segmentation datasets and a temporal weighted-guided filter, we observe stability improvements in the generated segmented videos with minimal loss of accuracy.

updated: Thu Mar 17 2022 13:52:21 GMT+0000 (UTC)

published: Sun Oct 17 2021 18:59:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト