MEDOE: A Multi-Expert Decoder and Output Ensemble Framework for Long-tailed Semantic Segmentation

Junao Shen; Long Chen; Kun Kuang; Fei Wu; Tian Feng; Wei Zhang

MEDOE: ロングテールセマンティックセグメンテーション用のマルチエキスパートデコーダおよび出力アンサンブルフレームワーク

従来の方法では無視されることが多かった意味カテゴリのロングテール分布は、末尾カテゴリでのセマンティックセグメンテーションのパフォーマンスが不十分になる原因となります。この論文では、ロングテールセマンティックセグメンテーションの問題に焦点を当てます。いくつかのロングテール認識方法（再サンプリング/再重み付けなど）は他の問題でも提案されていますが、おそらく重要なコンテキスト情報を侵害する可能性があるため、ロングテール意味セグメンテーションの問題にはほとんど適応できません。この問題に対処するために、我々は、コンテキスト情報のアンサンブルとグループ化によるロングテールセマンティックセグメンテーションのための新しいフレームワークである MEDOE を提案します。提案された 2 つのセージフレームワークは、マルチエキスパートデコーダ (MED) とマルチエキスパート出力アンサンブル (MOE) で構成されます。具体的には、MED には数人の「専門家」が含まれています。ピクセル頻度分布に基づいて、各専門家は特定のカテゴリに従ってマスクされたデータセットを入力として受け取り、分類のためのコンテキスト情報を自己適応的に生成します。 MOE は、専門家の出力のアンサンブルに対して学習可能な決定重みを採用しています。モデルに依存しないフレームワークとして、当社の MEDOE は、さまざまな一般的なディープニューラルネットワーク (DeepLabv3+、OCRNet、PSPNet など) と柔軟かつ効率的に結合して、ロングテールセマンティックセグメンテーションのパフォーマンスを向上させることができます。実験結果は、提案されたフレームワークが、Cityscapes データセットと ADE20K データセットの両方で現在の手法よりも mIoU で最大 1.78%、mAcc で最大 5.89% 優れていることを示しています。

Long-tailed distribution of semantic categories, which has been often ignored in conventional methods, causes unsatisfactory performance in semantic segmentation on tail categories. In this paper, we focus on the problem of long-tailed semantic segmentation. Although some long-tailed recognition methods (e.g., re-sampling/re-weighting) have been proposed in other problems, they can probably compromise crucial contextual information and are thus hardly adaptable to the problem of long-tailed semantic segmentation. To address this issue, we propose MEDOE, a novel framework for long-tailed semantic segmentation via contextual information ensemble-and-grouping. The proposed two-sage framework comprises a multi-expert decoder (MED) and a multi-expert output ensemble (MOE). Specifically, the MED includes several "experts". Based on the pixel frequency distribution, each expert takes the dataset masked according to the specific categories as input and generates contextual information self-adaptively for classification; The MOE adopts learnable decision weights for the ensemble of the experts' outputs. As a model-agnostic framework, our MEDOE can be flexibly and efficiently coupled with various popular deep neural networks (e.g., DeepLabv3+, OCRNet, and PSPNet) to improve their performance in long-tailed semantic segmentation. Experimental results show that the proposed framework outperforms the current methods on both Cityscapes and ADE20K datasets by up to 1.78% in mIoU and 5.89% in mAcc.

updated: Wed Aug 16 2023 08:30:44 GMT+0000 (UTC)

published: Wed Aug 16 2023 08:30:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト