Vision Transformers Are Good Mask Auto-Labelers

Shiyi Lan; Xitong Yang; Zhiding Yu; Zuxuan Wu; Jose M. Alvarez; Anima Anandkumar

ビジョントランスフォーマーは優れたマスク自動ラベラーです

ボックス注釈のみを使用したインスタンスセグメンテーション用の高品質の Transformer ベースのマスク自動ラベリングフレームワークである Mask Auto-Labeler (MAL) を提案します。 MAL はボックストリミングされた画像を入力として受け取り、条件付きでそれらのマスク疑似ラベルを生成します。ビジョントランスフォーマーが優れたマスク自動ラベラーであることを示します。私たちの方法は、マスクの品質に関する自動ラベリングと人間の注釈との間のギャップを大幅に削減します。 MAL で生成されたマスクを使用してトレーニングされたインスタンスセグメンテーションモデルは、完全に監視されたモデルのパフォーマンスを最大 97.4% 維持して、完全に監視されたモデルのパフォーマンスにほぼ匹敵します。最良のモデルは、COCO インスタンスセグメンテーション (test-dev 2017) で 44.1% の mAP を達成し、最先端のボックス監視方式を大幅に上回っています。定性的な結果は、MAL によって生成されたマスクが、場合によっては、人間の注釈よりも優れていることを示しています。

We propose Mask Auto-Labeler (MAL), a high-quality Transformer-based mask auto-labeling framework for instance segmentation using only box annotations. MAL takes box-cropped images as inputs and conditionally generates their mask pseudo-labels.We show that Vision Transformers are good mask auto-labelers. Our method significantly reduces the gap between auto-labeling and human annotation regarding mask quality. Instance segmentation models trained using the MAL-generated masks can nearly match the performance of their fully-supervised counterparts, retaining up to 97.4% performance of fully supervised models. The best model achieves 44.1% mAP on COCO instance segmentation (test-dev 2017), outperforming state-of-the-art box-supervised methods by significant margins. Qualitative results indicate that masks produced by MAL are, in some cases, even better than human annotations.

updated: Tue Jan 10 2023 18:59:00 GMT+0000 (UTC)

published: Tue Jan 10 2023 18:59:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト