Clustering as Attention: Unified Image Segmentation with Hierarchical Clustering

Teppei Suzuki

注意としてのクラスタリング: 階層的クラスタリングによる統合画像セグメンテーション

HCFormer と呼ばれる、ディープニューラルネットワーク用の階層的クラスタリングベースの画像セグメンテーションスキームを提案します。セマンティック、インスタンス、およびパノプティックセグメンテーションを含む画像セグメンテーションをピクセルクラスタリングの問題として解釈し、ディープニューラルネットワークを使用したボトムアップの階層的クラスタリングによってそれを達成します。当社の階層的クラスタリングは、セグメンテーションヘッドの前にあるピクセルデコーダーを取り除き、セグメンテーションパイプラインを簡素化することで、セグメンテーションの精度と相互運用性を向上させます。ピクセルクラスタリングはさまざまな画像セグメンテーションに共通のアプローチであるため、HCFormer は同じアーキテクチャでセマンティック、インスタンス、パノプティックセグメンテーションに対応できます。実験では、HCFormer は、セマンティックセグメンテーション (ADE20K で 55.5 mIoU)、インスタンスセグメンテーション (COCO で 47.1 AP)、およびパノプティックセグメンテーション (COCO で 55.7 PQ) のベースラインメソッドと比較して、同等または優れたセグメンテーション精度を達成します。

We propose a hierarchical clustering-based image segmentation scheme for deep neural networks, called HCFormer. We interpret image segmentation, including semantic, instance, and panoptic segmentation, as a pixel clustering problem, and accomplish it by bottom-up, hierarchical clustering with deep neural networks. Our hierarchical clustering removes the pixel decoder before the segmentation head and simplifies the segmentation pipeline, resulting in improved segmentation accuracies and interoperability. HCFormer can address semantic, instance, and panoptic segmentation with the same architecture because the pixel clustering is a common approach for various image segmentation. In experiments, HCFormer achieves comparable or superior segmentation accuracy compared to baseline methods on semantic segmentation (55.5 mIoU on ADE20K), instance segmentation (47.1 AP on COCO), and panoptic segmentation (55.7 PQ on COCO).

updated: Mon Aug 29 2022 06:10:37 GMT+0000 (UTC)

published: Fri May 20 2022 03:53:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト