Iterative Knowledge Exchange Between Deep Learning and Space-Time Spectral Clustering for Unsupervised Segmentation in Videos

Emanuela Haller; Adina Magda Florea; Marius Leordeanu

ビデオの教師なしセグメンテーションのための深層学習と時空間スペクトルクラスタリング間の反復知識交換

ビデオ内の教師なしオブジェクトセグメンテーションのためのデュアルシステムを提案します。これは、ビデオ内のオブジェクトを検出する時空間グラフと強力なオブジェクト機能を学習する深いネットワークという、補完的なプロパティを持つ2つのモジュールをまとめたものです。システムは反復的な知識交換ポリシーを使用します。グラフ上の新しいスペクトル時空間クラスタリングプロセスは、疑似ラベルとしてネットワークに渡される教師なしセグメンテーションマスクを生成します。ネットは、グラフがビデオで検出したものを単一フレームにセグメント化することを学習し、次の反復でノードレベルの機能を改善する強力な画像レベルの機能をグラフに返します。知識は収束するまで数サイクル交換されます。グラフにはビデオピクセルごとに1つのノードがありますが、オブジェクトの検出は高速です。これは、実際に行列を計算することなく、特別な特徴-運動行列の主要な固有ベクトルとしてメイン時空クラスターを計算する新しいべき乗法アルゴリズムを使用します。徹底的な実験的分析は、私たちの理論的主張を検証し、周期的な知識交換の有効性を証明します。また、人間の監視で事前にトレーニングされた機能を組み込んだ、監視ありシナリオでの実験も実行します。 DAVIS、SegTrack、YouTube-Objects、DAVSODの4つの難しいデータセットで、教師なしシナリオと教師付きシナリオで最先端のレベルを達成します。

We propose a dual system for unsupervised object segmentation in video, which brings together two modules with complementary properties: a space-time graph that discovers objects in videos and a deep network that learns powerful object features. The system uses an iterative knowledge exchange policy. A novel spectral space-time clustering process on the graph produces unsupervised segmentation masks passed to the network as pseudo-labels. The net learns to segment in single frames what the graph discovers in video and passes back to the graph strong image-level features that improve its node-level features in the next iteration. Knowledge is exchanged for several cycles until convergence. The graph has one node per each video pixel, but the object discovery is fast. It uses a novel power iteration algorithm computing the main space-time cluster as the principal eigenvector of a special Feature-Motion matrix without actually computing the matrix. The thorough experimental analysis validates our theoretical claims and proves the effectiveness of the cyclical knowledge exchange. We also perform experiments on the supervised scenario, incorporating features pretrained with human supervision. We achieve state-of-the-art level on unsupervised and supervised scenarios on four challenging datasets: DAVIS, SegTrack, YouTube-Objects, and DAVSOD.

updated: Sun Dec 13 2020 18:36:18 GMT+0000 (UTC)

published: Sun Dec 13 2020 18:36:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト