BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

Brent A. Griffin; Jason J. Corso

BubbleNets：フレームのディープソーティングによるビデオオブジェクトセグメンテーションでのガイダンスフレームの選択の学習

半教師ありビデオオブジェクトのセグメンテーションは、近年、実際の挑戦的なビデオで大きな進歩を遂げました。セグメンテーション方法とベンチマークデータセットの現在のパラダイムは、最初のフレームに単一の注釈が付けられたビデオ内のオブジェクトをセグメント化することです。ただし、アノテーション用の代替フレームを選択すると、ビデオ全体のセグメンテーションパフォーマンスが大幅に異なることがわかります。このペーパーでは、ユーザー注釈のためにビデオ全体で単一の最良のフレームを提案することを学習する問題に対処します。これは、実際、ビデオの最初のフレームではありません。これは、既存のデータセットから膨大な量のトレーニング例を変換できるパフォーマンスベースの損失関数を使用してフレームの選択を学習する新しいディープソーティングネットワークであるBubbleNetsを導入することで実現します。 BubbleNetsを使用すると、基本的なセグメンテーション方法を変更することなく、DAVISベンチマークでセグメンテーションパフォーマンスを11％相対的に向上させることができます。

Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years. The current paradigm for segmentation methods and benchmark datasets is to segment objects in video provided a single annotation in the first frame. However, we find that segmentation performance across the entire video varies dramatically when selecting an alternative frame for annotation. This paper address the problem of learning to suggest the single best frame across the video for user annotation-this is, in fact, never the first frame of video. We achieve this by introducing BubbleNets, a novel deep sorting network that learns to select frames using a performance-based loss function that enables the conversion of expansive amounts of training examples from already existing datasets. Using BubbleNets, we are able to achieve an 11% relative improvement in segmentation performance on the DAVIS benchmark without any changes to the underlying method of segmentation.

updated: Tue Nov 24 2020 01:52:04 GMT+0000 (UTC)

published: Thu Mar 28 2019 03:42:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト