Location-Aware Self-Supervised Transformers for Semantic Segmentation

Mathilde Caron; Neil Houlsby; Cordelia Schmid

セマンティックセグメンテーションのための位置認識自己教師付きトランスフォーマー

ピクセルレベルのラベルは、取得に特に費用がかかります。したがって、事前トレーニングは、セマンティックセグメンテーションのようなタスクでモデルを改善するための重要なステップです。ただし、ニューラルネットワークを事前トレーニングするための著名なアルゴリズムでは、画像分類、CLIP による画像とテキストの配置、自己教師あり対照学習など、画像レベルの目的が使用されます。これらの目的は空間情報をモデル化しないため、空間推論を使用して下流のタスクを微調整するときに最適ではない可能性があります。この作業では、強力な高密度機能の出現を促進する、位置認識 (LOCA) 自己教師あり方法を使用してネットワークを事前トレーニングします。具体的には、パッチレベルのクラスタリングスキームを使用して高密度の疑似ラベルをマイニングし、相対位置予測タスクを使用して、オブジェクトのパーツとその空間配置についての学習を促進します。私たちの実験は、LOCA 事前トレーニングが、挑戦的で多様なセマンティックセグメンテーションデータセットに競合的に移行する表現につながることを示しています。

Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which might be sub-optimal when finetuning on downstream tasks with spatial reasoning. In this work, we pretrain network with a location-aware (LOCA) self-supervised method which fosters the emergence of strong dense features. Specifically, we use both a patch-level clustering scheme to mine dense pseudo-labels and a relative location prediction task to encourage learning about object parts and their spatial arrangements. Our experiments show that LOCA pretraining leads to representations that transfer competitively to challenging and diverse semantic segmentation datasets.

updated: Wed Mar 15 2023 20:08:12 GMT+0000 (UTC)

published: Mon Dec 05 2022 16:24:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト