Label Mask AutoEncoder(L-MAE): A Pure Transformer Method to Augment Semantic Segmentation Datasets

Jiaru Jia; Mingzhe Liu; Jiake Xie; Xin Chen; Aiqing Yang; Xin Jiang; Hong Zhang; Yong Tang

Label Mask AutoEncoder(L-MAE): セマンティックセグメンテーションデータセットを拡張する純粋なトランスフォーマーメソッド

従来のニューラルネットワークに基づくセマンティックセグメンテーションモデルは、このようなタスクで驚くべきパフォーマンスを達成できますが、データセットはトレーニングモデルプロセスにとって重要です。最近、半教師付きセマンティックセグメンテーションでデータセットの拡張が大幅に進歩しました。ただし、ラベルが欠落している可能性があるため、ピクセルレベルの情報を完成させることは依然として困難です。 Mask AutoEncoder に着想を得て、ラベル内の既存の情報を完全に使用して結果を予測する、シンプルでありながら効果的なピクセルレベルの補完メソッドである Label Mask AutoEncoder(L-MAE) を紹介します。提案されたモデルは、ラベルと対応する画像を積み重ねる融合戦略、つまりFuse Mapを採用しています。さらに、Fuse Map をマスキングすると画像情報の一部が失われるため、直接再構築するとパフォーマンスが低下する可能性があります。提案された Image Patch Supplement アルゴリズムは、不足している情報を補うことができ、実験が示すように、平均 4.1% の mIoU を改善できます。 Pascal VOC2012 データセット (224 作物サイズ、20 クラス) と Cityscape データセット (448 作物サイズ、19 クラス) が比較実験に使用されます。マスク率を 50% に設定すると、予測領域に関して、提案されたモデルは、Pascal VOC 2012 と Cityscape でそれぞれ mIoU の 91.0% と 86.4% を達成し、他の現在の教師付きセマンティックセグメンテーションモデルよりも優れています。私たちのコードとモデルは、https://github.com/jjrccop/Label-Mask-Auto-Encoder で入手できます。

Semantic segmentation models based on the conventional neural network can achieve remarkable performance in such tasks, while the dataset is crucial to the training model process. Significant progress in expanding datasets has been made in semi-supervised semantic segmentation recently. However, completing the pixel-level information remains challenging due to possible missing in a label. Inspired by Mask AutoEncoder, we present a simple yet effective Pixel-Level completion method, Label Mask AutoEncoder(L-MAE), that fully uses the existing information in the label to predict results. The proposed model adopts the fusion strategy that stacks the label and the corresponding image, namely Fuse Map. Moreover, since some of the image information is lost when masking the Fuse Map, direct reconstruction may lead to poor performance. Our proposed Image Patch Supplement algorithm can supplement the missing information, as the experiment shows, an average of 4.1% mIoU can be improved. The Pascal VOC2012 dataset (224 crop size, 20 classes) and the Cityscape dataset (448 crop size, 19 classes) are used in the comparative experiments. With the Mask Ratio setting to 50%, in terms of the prediction region, the proposed model achieves 91.0% and 86.4% of mIoU on Pascal VOC 2012 and Cityscape, respectively, outperforming other current supervised semantic segmentation models. Our code and models are available at https://github.com/jjrccop/Label-Mask-Auto-Encoder.

updated: Mon Nov 21 2022 08:15:18 GMT+0000 (UTC)

published: Mon Nov 21 2022 08:15:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト