Contrastive pretraining for semantic segmentation is robust to noisy positive pairs

Sebastian Gerard; Josephine Sullivan

セマンティックセグメンテーションの対照的な事前トレーニングは、ノイズの多い正のペアに対してロバストです

対照的な学習のドメイン固有のバリアントは、2 つの異なるドメイン内画像から正のペアを構築できますが、従来の方法では同じ画像を 2 回拡張するだけです。たとえば、異なる時間に同じ場所を示す 2 つの衛星画像から正のペアを形成できます。理想的には、これにより、季節、気象条件、または画像取得アーティファクトによって引き起こされる変化を無視するようモデルに学習させることができます。ただし、従来の対照的な方法とは異なり、人間の監督なしでそれらを形成するため、これは望ましくない正のペアになる可能性があります。たとえば、正のペアは、災害前と災害後の 1 つのイメージで構成される場合があります。これにより、無傷の建物と損傷した建物の違いを無視するようにモデルに学習させることができます。これは、下流のタスクで検出する必要がある可能性があります。偽陰性のペアと同様に、これはモデルのパフォーマンスを妨げる可能性があります。重要なことに、この設定では、画像の一部のみが関連する方法で異なり、他の部分は類似しています。驚くべきことに、ダウンストリームのセマンティックセグメンテーションは、このような一致度の低いペアに対して堅牢であるか、またはそれらの恩恵を受けることさえあります。実験は、リモートセンシングデータセット xBD と、ペアリング条件を完全に制御できる合成セグメンテーションデータセットで行われます。その結果、実践者は、ポジティブペアを事前にフィルタリングすることなく、これらのドメイン固有の対照的な方法を使用できます。また、事前トレーニングデータセットにそのようなペアを意図的に含めることを奨励されることさえあります。

Domain-specific variants of contrastive learning can construct positive pairs from two distinct in-domain images, while traditional methods just augment the same image twice. For example, we can form a positive pair from two satellite images showing the same location at different times. Ideally, this teaches the model to ignore changes caused by seasons, weather conditions or image acquisition artifacts. However, unlike in traditional contrastive methods, this can result in undesired positive pairs, since we form them without human supervision. For example, a positive pair might consist of one image before a disaster and one after. This could teach the model to ignore the differences between intact and damaged buildings, which might be what we want to detect in the downstream task. Similar to false negative pairs, this could impede model performance. Crucially, in this setting only parts of the images differ in relevant ways, while other parts remain similar. Surprisingly, we find that downstream semantic segmentation is either robust to such badly matched pairs or even benefits from them. The experiments are conducted on the remote sensing dataset xBD, and a synthetic segmentation dataset for which we have full control over the pairing conditions. As a result, practitioners can use these domain-specific contrastive methods without having to filter their positive pairs beforehand, or might even be encouraged to purposefully include such pairs in their pretraining dataset.

updated: Mon Jan 23 2023 18:59:54 GMT+0000 (UTC)

published: Thu Nov 24 2022 18:59:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト