HAISTA-NET: Human Assisted Instance Segmentation Through Attention

Muhammed Korkmaz; Tolga Buyukyazi; T. Metin Sezgin

HAISTA-NET: 注意によるヒューマンアシストインスタンスセグメンテーション

インスタンスセグメンテーションは、高度な精度が要求されるオブジェクトの調整、医用画像解析、画像/ビデオ編集など、さまざまなアプリケーションを備えた画像検出の形式です。ただし、この精度は、最先端の完全に自動化されたインスタンスセグメンテーションアルゴリズムでさえ提供できる範囲を超えていることがよくあります。小さくて複雑なオブジェクトでは、パフォーマンスのギャップが特に大きくなります。専門家は通常、面倒なプロセスになる可能性がある完全に手動の注釈に頼っています。この問題を克服するために、より正確な予測を可能にし、高曲率で複雑で小規模なオブジェクトに対して高品質のセグメンテーションマスクを生成する新しいアプローチを提案します。人間が支援するセグメンテーションモデルである HAISTA-NET は、既存の Strong Mask R-CNN ネットワークを拡張して、人間が指定した部分的な境界を組み込みます。また、人間の注意マップと呼ばれる、手描きの部分的なオブジェクト境界のデータセットも提示します。さらに、Partial Sketch Object Boundaries (PSOB) データセットには、オブジェクトのグラウンドトゥルースマスクの曲率を数ピクセルで表す手描きの部分的なオブジェクト境界が含まれています。 PSOB データセットを使用した広範な評価を通じて、HAISTA-NET が Mask R-CNN、Strong Mask R-CNN、および Mask2Former などの最先端の手法よりも優れていることを示し、それぞれ +36.7、+29.6、および + の増加を達成しています。これら 3 つのモデルの AP-Mask メトリックで 26.5 ポイント。私たちの新しいアプローチが、完全に自動化されたインタラクティブなインスタンスセグメンテーションアーキテクチャを組み合わせることで、将来の人力による深層学習モデルのベースラインを設定することを願っています。

Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

updated: Fri May 12 2023 09:43:21 GMT+0000 (UTC)

published: Thu May 04 2023 18:39:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト