SegSort: Segmentation by Discriminative Sorting of Segments

Jyh-Jing Hwang; Stella X. Yu; Jianbo Shi; Maxwell D. Collins; Tien-Ju Yang; Xiao Zhang; Liang-Chieh Chen

SegSort：セグメントの識別ソートによるセグメンテーション

セマンティックセグメンテーションの既存の深層学習アプローチのほとんどすべてが、ピクセル単位の分類問題としてこのタスクに取り組んでいます。しかし、人間はピクセルの観点ではなく、認識の基本的な構成要素である知覚グループと構造に分解することでシーンを理解します。これにより、このプロセスを模倣するエンドツーエンドのピクセル単位のメトリック学習アプローチを提案するようになりました。このアプローチでは、最適な視覚的表現により、個々の画像内の適切なセグメンテーションが決定され、セグメントが画像全体で同じセマンティッククラスに関連付けられます。したがって、視覚的な学習の中心的な問題は、セグメント内の類似性を最大化し、セグメント間の類似性を最小化することです。この方法でトレーニングされたモデルを考えると、注釈付きセットからの最近傍の多数決によってセマンティックラベルが決定され、ピクセル単位の埋め込みとクラスタリングを抽出することにより、一貫して推論が実行されます。その結果、教師なしセマンティックセグメンテーションにディープラーニングを使用する最初の試みとして、SegSortを提示し、その監視対象のセマンティックの76％のパフォーマンスを達成しました。監視が利用可能な場合、SegSortは、ピクセル単位のソフトマックストレーニングに基づく従来のアプローチよりも一貫した改善を示します。さらに、このアプローチでは、より正確な境界と一貫した領域予測が生成されます。提案されたSegSortはさらに、取得した最も近いセグメントからラベルの各選択を容易に理解できるため、解釈可能な結果を生成します。

Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In our approach, the optimal visual representation determines the right segmentation within individual images and associates segments with the same semantic classes across images. The core visual learning problem is therefore to maximize the similarity within segments and minimize the similarity between segments. Given a model trained this way, inference is performed consistently by extracting pixel-wise embeddings and clustering, with the semantic label determined by the majority vote of its nearest neighbors from an annotated set. As a result, we present the SegSort, as a first attempt using deep learning for unsupervised semantic segmentation, achieving 76% performance of its supervised counterpart. When supervision is available, SegSort shows consistent improvements over conventional approaches based on pixel-wise softmax training. Additionally, our approach produces more precise boundaries and consistent region predictions. The proposed SegSort further produces an interpretable result, as each choice of label can be easily understood from the retrieved nearest segments.

updated: Wed Oct 30 2019 17:43:04 GMT+0000 (UTC)

published: Tue Oct 15 2019 17:58:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト