Towards Unsupervised Open World Semantic Segmentation

Svenja Uhlemeyer; Matthias Rottmann; Hanno Gottschalk

教師なしオープンワールドセマンティックセグメンテーションに向けて

画像のセマンティックセグメンテーションの場合、最先端のディープニューラルネットワーク（DNN）は、そのタスクが閉じたクラスのセットに制限されている場合、高いセグメンテーション精度を実現します。ただし、現在のところ、DNNはオープンワールドで動作する能力が限られており、未知のオブジェクトに属するピクセルを識別し、最終的には新しいクラスを段階的に学習するように任務を負っています。人間には言う能力があります：それが何であるかはわかりませんが、私はすでにそのようなものを見てきました。したがって、教師なしの方法でそのような漸進的な学習タスクを実行することが望ましい。視覚的類似性に基づいて未知のオブジェクトをクラスター化する方法を紹介します。これらのクラスターは、新しいクラスを定義し、教師なし増分学習のトレーニングデータとして機能するために使用されます。より正確には、予測されたセグメンテーションセグメンテーションの連結成分は、セグメンテーション品質推定によって評価されます。推定予測品質が低い連結成分は、後続のクラスタリングの候補です。さらに、コンポーネントごとの品質評価により、未知のオブジェクトを含む可能性のある画像領域の予測セグメンテーションマスクを取得できます。そのようなマスクのそれぞれのピクセルは、疑似ラベルが付けられ、その後、DNNを再トレーニングするために使用されます。つまり、人間によって生成されたグラウンドトゥルースを使用しません。私たちの実験では、グラウンドトゥルースにアクセスせず、データが少ない場合でも、DNNのクラス空間を新しいクラスで拡張して、かなりのセグメンテーション精度を実現できることを示しています。

For the semantic segmentation of images, state-of-the-art deep neural networks (DNNs) achieve high segmentation accuracy if that task is restricted to a closed set of classes. However, as of now DNNs have limited ability to operate in an open world, where they are tasked to identify pixels belonging to unknown objects and eventually to learn novel classes, incrementally. Humans have the capability to say: I don't know what that is, but I've already seen something like that. Therefore, it is desirable to perform such an incremental learning task in an unsupervised fashion. We introduce a method where unknown objects are clustered based on visual similarity. Those clusters are utilized to define new classes and serve as training data for unsupervised incremental learning. More precisely, the connected components of a predicted semantic segmentation are assessed by a segmentation quality estimate. connected components with a low estimated prediction quality are candidates for a subsequent clustering. Additionally, the component-wise quality assessment allows for obtaining predicted segmentation masks for the image regions potentially containing unknown objects. The respective pixels of such masks are pseudo-labeled and afterwards used for re-training the DNN, i.e., without the use of ground truth generated by humans. In our experiments we demonstrate that, without access to ground truth and even with few data, a DNN's class space can be extended by a novel class, achieving considerable segmentation accuracy.

updated: Tue Jan 04 2022 10:29:34 GMT+0000 (UTC)

published: Tue Jan 04 2022 10:29:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト