Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

Wentong Li; Wenyu Liu; Jianke Zhu; Miaomiao Cui; Risheng Yu; Xiansheng Hua; Lei Zhang

Box2Mask: レベルセット進化によるボックス監視インスタンスセグメンテーション

ピクセル単位のマスクラベルを使用する完全に教師ありの方法とは対照的に、ボックス教師ありのインスタンスセグメンテーションは、最近研究の注目を集めている単純なボックスアノテーションを利用します。このホワイトペーパーでは、新しいシングルショットインスタンスセグメンテーションアプローチ、つまり Box2Mask を紹介します。これは、古典的なレベルセット進化モデルをディープニューラルネットワーク学習に統合し、バウンディングボックス監視のみで正確なマスク予測を実現します。具体的には、入力画像とその深い特徴の両方を使用して、レベルセット曲線を暗黙的に進化させ、ピクセルアフィニティカーネルに基づくローカル整合性モジュールを使用して、ローカルコンテキストと空間関係をマイニングします。 2 種類のシングルステージフレームワーク、つまり、CNN ベースとトランスフォーマーベースのフレームワークが開発され、ボックス監視インスタンスセグメンテーションのレベルセットの進化が強化されます。各フレームワークは、インスタンス認識デコーダー、ボックス-レベルマッチングの割り当てとレベルセットの進化。レベルセットエネルギー関数を最小化することで、各インスタンスのマスクマップをそのバウンディングボックスアノテーション内で繰り返し最適化できます。一般的なシーン、リモートセンシング、医療、シーンのテキスト画像をカバーする 5 つの挑戦的なテストベッドでの実験結果は、ボックス教師付きインスタンスセグメンテーションに対する提案された Box2Mask アプローチの優れたパフォーマンスを示しています。特に、Swin-Transformer の大きなバックボーンを使用すると、Box2Mask は COCO で 42.4% のマスク AP を取得します。これは、最近開発された完全にマスク監視された方法と同等です。コードは https://github.com/LiWentomng/boxlevelset で入手できます。

In contrast to fully supervised methods using pixel-wise mask labels, box-supervised instance segmentation takes advantage of simple box annotations, which has recently attracted increasing research attention. This paper presents a novel single-shot instance segmentation approach, namely Box2Mask, which integrates the classical level-set evolution model into deep neural network learning to achieve accurate mask prediction with only bounding box supervision. Specifically, both the input image and its deep features are employed to evolve the level-set curves implicitly, and a local consistency module based on a pixel affinity kernel is used to mine the local context and spatial relations. Two types of single-stage frameworks, i.e., CNN-based and transformer-based frameworks, are developed to empower the level-set evolution for box-supervised instance segmentation, and each framework consists of three essential components: instance-aware decoder, box-level matching assignment and level-set evolution. By minimizing the level-set energy function, the mask map of each instance can be iteratively optimized within its bounding box annotation. The experimental results on five challenging testbeds, covering general scenes, remote sensing, medical and scene text images, demonstrate the outstanding performance of our proposed Box2Mask approach for box-supervised instance segmentation. In particular, with the Swin-Transformer large backbone, our Box2Mask obtains 42.4% mask AP on COCO, which is on par with the recently developed fully mask-supervised methods. The code is available at: https://github.com/LiWentomng/boxlevelset.

updated: Sat Dec 03 2022 09:32:14 GMT+0000 (UTC)

published: Sat Dec 03 2022 09:32:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト