Learning to Better Segment Objects from Unseen Classes with Unlabeled Videos

Yuming Du; Yang Xiao; Vincent Lepetit

ラベルのないビデオを使用して、見えないクラスからオブジェクトをより適切にセグメント化する方法を学ぶ

目に見えないクラスからオブジェクトをローカライズしてセグメント化する機能は、アクティブビジョンでの自律オブジェクト学習などの新しいアプリケーションへの扉を開きます。それでも、見えないクラスのパフォーマンスを向上させるには追加のトレーニングデータが必要ですが、見えないクラスのオブジェクトに手動で注釈を付けると、手間がかかり、費用がかかる可能性があります。このホワイトペーパーでは、ラベルのないビデオシーケンスを使用して、見えないクラスのオブジェクトのトレーニングデータを自動的に生成する方法について説明します。原則として、既存のビデオセグメンテーション方法をラベルのないビデオに適用し、オブジェクトマスクを自動的に取得することができます。これは、手動ラベルが使用できないクラスのトレーニングセットとしても使用できます。しかし、私たちの実験は、これらの方法がこの目的のために十分に機能しないことを示しています。したがって、このようなトレーニングセットを自動的に作成するように特別に設計されたベイジアンメソッドを導入します。このメソッドは、オブジェクト提案のセットから開始し、効率的な最適化を実行して正しいものを選択するために、（非現実的な）合成による分析に依存します。すべてのフレームで同時に。広範な実験を通じて、私たちの方法が、見えないクラスのオブジェクトをセグメント化するパフォーマンスを大幅に向上させる高品質のトレーニングセットを生成できることを示します。したがって、私たちの方法は、豊富なインターネットビデオを使用してオープンワールドのインスタンスセグメンテーションへの扉を開くことができると信じています。

The ability to localize and segment objects from unseen classes would open the door to new applications, such as autonomous object learning in active vision. Nonetheless, improving the performance on unseen classes requires additional training data, while manually annotating the objects of the unseen classes can be labor-extensive and expensive. In this paper, we explore the use of unlabeled video sequences to automatically generate training data for objects of unseen classes. It is in principle possible to apply existing video segmentation methods to unlabeled videos and automatically obtain object masks, which can then be used as a training set even for classes with no manual labels available. However, our experiments show that these methods do not perform well enough for this purpose. We therefore introduce a Bayesian method that is specifically designed to automatically create such a training set: Our method starts from a set of object proposals and relies on (non-realistic) analysis-by-synthesis to select the correct ones by performing an efficient optimization over all the frames simultaneously. Through extensive experiments, we show that our method can generate a high-quality training set which significantly boosts the performance of segmenting objects of unseen classes. We thus believe that our method could open the door for open-world instance segmentation using abundant Internet videos.

updated: Sun Apr 25 2021 22:05:44 GMT+0000 (UTC)

published: Sun Apr 25 2021 22:05:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト