Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Tommi Kerola; Jie Li; Atsushi Kanehira; Yasunori Kudo; Alexis Vallet; Adrien Gaidon

プロポーザル不要のパノプティックセグメンテーションのための階層的な Lovász 埋め込み

Panoptic セグメンテーションは、インスタンスとセマンティックセグメンテーションという 2 つの個別のタスクをまとめます。それらは関連していますが、それらを統合すると、明らかにパラドックスに直面します。つまり、インスタンス固有の表現とカテゴリ固有の (つまり、インスタンスにとらわれない) 表現を同時に学習する方法です。したがって、最先端のパノプティセグメンテーション手法では、タスクごとに異なるストリームを持つ複雑なモデルを使用します。対照的に、インスタンスレベルとカテゴリレベルの識別情報を同時にエンコードする、ピクセルごとの特徴ベクトルである階層的な Lovász Embeddings を提案します。階層的な Lovász ヒンジ損失を使用して、個別のネットワークブランチやオブジェクトの提案を必要とせずに、統一されたセマンティックおよびインスタンス階層に構造化された低次元の埋め込み空間を学習します。プロポーザルなしの方法でインスタンスを正確にモデル化することに加えて、当社の階層型 Lovász Embeddings は、インスタンスセグメンテーションメソッドが適用されない非インスタンス「スタッフ」クラスを含む、単純な Nearest-Class-Mean 分類子を使用してカテゴリに一般化します。私たちのシンプルなモデルは、Cityscapes、COCO、Mapillary Vistas の既存の提案不要のパノプティックセグメンテーション方法と比較して、最先端の結果を達成します。さらに、私たちのモデルは、ビデオフレーム間の時間的安定性を示しています。

Panoptic segmentation brings together two separate tasks: instance and semantic segmentation. Although they are related, unifying them faces an apparent paradox: how to learn simultaneously instance-specific and category-specific (i.e. instance-agnostic) representations jointly. Hence, state-of-the-art panoptic segmentation methods use complex models with a distinct stream for each task. In contrast, we propose Hierarchical Lovász Embeddings, per pixel feature vectors that simultaneously encode instance- and category-level discriminative information. We use a hierarchical Lovász hinge loss to learn a low-dimensional embedding space structured into a unified semantic and instance hierarchy without requiring separate network branches or object proposals. Besides modeling instances precisely in a proposal-free manner, our Hierarchical Lovász Embeddings generalize to categories by using a simple Nearest-Class-Mean classifier, including for non-instance "stuff" classes where instance segmentation methods are not applicable. Our simple model achieves state-of-the-art results compared to existing proposal-free panoptic segmentation methods on Cityscapes, COCO, and Mapillary Vistas. Furthermore, our model demonstrates temporal stability between video frames.

updated: Tue Jun 08 2021 17:43:54 GMT+0000 (UTC)

published: Tue Jun 08 2021 17:43:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト