MP-Former: Mask-Piloted Transformer for Image Segmentation

Hao Zhang; Feng Li; Huaizhe Xu; Shijia Huang; Shilong Liu; Lionel M. Ni; Lei Zhang

MP-Former: 画像セグメンテーション用のマスクパイロットトランスフォーマー

画像セグメンテーション用の Mask2Former でマスクされた注意を改善するマスクパイロットトランスフォーマーを提示します。この改善は、Mask2Former が連続するデコーダーレイヤー間で一貫性のないマスク予測に悩まされているという私たちの観察に基づいています。これにより、一貫性のない最適化目標とデコーダークエリの使用率の低下につながります。この問題に対処するために、マスクパイロットトレーニングアプローチを提案します。これは、マスクされた注意でノイズのあるグラウンドトゥルースマスクをさらにフィードし、モデルをトレーニングして元のマスクを再構築します。 mask-attention で使用される予測マスクと比較して、グラウンドトゥルースマスクはパイロットとして機能し、Mask2Former での不正確なマスク予測の悪影響を効果的に軽減します。この手法に基づいて、\M は 3 つの画像セグメンテーションタスク (インスタンス、パノプティック、およびセマンティック) のすべてで顕著なパフォーマンスの向上を達成し、Cityscapes インスタンスで +2.3AP および +1.6mIoU を生成し、ResNet-50 バックボーンを使用してセマンティックセグメンテーションタスクを生成します。 .また、私たちの方法はトレーニングを大幅に高速化し、ResNet-50 と Swin-L バックボーンの両方を使用した ADE20K のトレーニングエポック数の半分で Mask2Former を上回りました。さらに、私たちの方法は、トレーニング中にほとんど計算を導入せず、推論中に余分な計算を導入しません。コードは https://github.com/IDEA-Research/MP-Former でリリースされます。

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation. The improvement is based on our observation that Mask2Former suffers from inconsistent mask predictions between consecutive decoder layers, which leads to inconsistent optimization goals and low utilization of decoder queries. To address this problem, we propose a mask-piloted training approach, which additionally feeds noised ground-truth masks in masked-attention and trains the model to reconstruct the original ones. Compared with the predicted masks used in mask-attention, the ground-truth masks serve as a pilot and effectively alleviate the negative impact of inaccurate mask predictions in Mask2Former. Based on this technique, our \M achieves a remarkable performance improvement on all three image segmentation tasks (instance, panoptic, and semantic), yielding +2.3AP and +1.6mIoU on the Cityscapes instance and semantic segmentation tasks with a ResNet-50 backbone. Our method also significantly speeds up the training, outperforming Mask2Former with half of the number of training epochs on ADE20K with both a ResNet-50 and a Swin-L backbones. Moreover, our method only introduces little computation during training and no extra computation during inference. Our code will be released at https://github.com/IDEA-Research/MP-Former.

updated: Wed Mar 15 2023 17:30:03 GMT+0000 (UTC)

published: Mon Mar 13 2023 17:57:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト