Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Bowen Shi; Xiaopeng Zhang; Haohang Xu; Wenrui Dai; Junni Zou; Hongkai Xiong; Qi Tian

マルチデータセットの事前トレーニング: セマンティックセグメンテーションの統合モデル

セマンティックセグメンテーションのために注釈付きデータを収集することは、時間がかかり、スケールアップするのが困難です。この論文では、異なるデータセットの断片化された注釈を最大限に活用するために、マルチデータセット事前トレーニングと呼ばれる統合フレームワークを初めて提案します。ハイライトは、異なるドメインからのアノテーションを効率的に再利用でき、特定のドメインごとに一貫してパフォーマンスを向上できることです。これは、最初に、分類ラベルに関係なく複数のデータセットで提案されたピクセルからプロトタイプへの対照的損失を介してネットワークを事前トレーニングし、続いて通常どおり特定のデータセットで事前トレーニング済みモデルを微調整することによって達成されます。異なるデータセットからの画像とクラス間の関係をより適切にモデル化するために、クロスデータセットの混合を介してピクセルレベルの埋め込みを拡張し、多様な埋め込み空間にわたってピクセルクラスの類似性を明示的にモデル化するピクセル間のスパースコーディング戦略を提案します。このようにして、クラス内のコンパクト性とクラス間の分離性を向上させることができ、異なるデータセット間でクラス間の類似性を考慮して転送性を向上させることができます。いくつかのベンチマークで行われた実験は、その優れたパフォーマンスを示しています。特に、MDP は ImageNet よりも一貫して事前トレーニング済みモデルよりもかなりのマージンで優れていますが、事前トレーニングに使用するサンプルは 10% 未満です。

Collecting annotated data for semantic segmentation is time-consuming and hard to scale up. In this paper, we for the first time propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. The highlight is that the annotations from different domains can be efficiently reused and consistently boost performance for each specific domain. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets regardless of their taxonomy labels, and followed by fine-tuning the pretrained model over specific dataset as usual. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing and propose a pixel-to-class sparse coding strategy that explicitly models the pixel-class similarity over the manifold embedding space. In this way, we are able to increase intra-class compactness and inter-class separability, as well as considering inter-class similarity across different datasets for better transferability. Experiments conducted on several benchmarks demonstrate its superior performance. Notably, MDP consistently outperforms the pretrained models over ImageNet by a considerable margin, while only using less than 10% samples for pretraining.

updated: Tue Jun 08 2021 06:13:11 GMT+0000 (UTC)

published: Tue Jun 08 2021 06:13:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト