HCFormer: Unified Image Segmentation with Hierarchical Clustering

Teppei Suzuki

HCFormer: 階層的クラスタリングによる統合画像セグメンテーション

階層的クラスタリングは、従来の画像セグメンテーション手法で広く使用されている効果的かつ効率的なアプローチです。ただし、ニューラルネットワークを使用する既存の方法の多くは、ピクセルごとの特徴から直接セグメンテーションマスクを生成するため、アーキテクチャ設計が複雑になり、解釈可能性が低下します。この作業では、HCFormer と呼ばれる、より単純で解釈可能なアーキテクチャを提案します。 HCFormer は、ボトムアップの階層的クラスタリングによって画像のセグメンテーションを実現し、中間結果を階層的クラスタリングの結果として解釈、視覚化、および評価することを可能にします。ピクセルクラスタリングはさまざまな画像セグメンテーションタスクに共通のアプローチであるため、HCFormer は同じアーキテクチャでセマンティック、インスタンス、およびパノプティックセグメンテーションに対応できます。実験では、HCFormer は、セマンティックセグメンテーション (ADE20K で 55.5 mIoU)、インスタンスセグメンテーション (COCO で 47.1 AP)、およびパノプティックセグメンテーション (COCO で 55.7 PQ) のベースラインメソッドと比較して、同等または優れたセグメンテーション精度を達成します。

Hierarchical clustering is an effective and efficient approach widely used for classical image segmentation methods. However, many existing methods using neural networks generate segmentation masks directly from per-pixel features, complicating the architecture design and degrading the interpretability. In this work, we propose a simpler, more interpretable architecture, called HCFormer. HCFormer accomplishes image segmentation by bottom-up hierarchical clustering and allows us to interpret, visualize, and evaluate the intermediate results as hierarchical clustering results. HCFormer can address semantic, instance, and panoptic segmentation with the same architecture because the pixel clustering is a common approach for various image segmentation tasks. In experiments, HCFormer achieves comparable or superior segmentation accuracy compared to baseline methods on semantic segmentation (55.5 mIoU on ADE20K), instance segmentation (47.1 AP on COCO), and panoptic segmentation (55.7 PQ on COCO).

updated: Tue Feb 28 2023 01:23:20 GMT+0000 (UTC)

published: Fri May 20 2022 03:53:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト