Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning

Chenhongyi Yang; Lichao Huang; Elliot J. Crowley

空間ノイズカリキュラム学習による対照的なオブジェクトレベルの事前トレーニング

対照学習ベースの事前トレーニングの目標は、ラベルのない大量のデータを活用して、ダウンストリームに容易に適応できるモデルを作成することです。現在のアプローチは、画像識別タスクの解決を中心に展開しています。アンカー画像、その画像の拡張された対応物、およびその他のいくつかの画像が与えられた場合、モデルは、アンカーとその対応物の間の距離が小さく、アンカーと他の画像は大きいです。このアプローチには2つの重要な問題があります。（i）画像レベルで表現を対比することにより、インスタンスのセグメンテーションなどのダウンストリームのオブジェクトレベルのタスクに役立つ詳細なオブジェクトセンシティブ機能を生成することは困難です。（ii）拡張された対応物を生成する拡張戦略が修正され、事前トレーニングの後の段階で学習の効果が低下します。この作業では、これらの問題に取り組むためにカリキュラム対照オブジェクトレベル事前トレーニング（CCOP）を紹介します。（i）選択的検索を使用して大まかなオブジェクト領域を見つけ、それらを使用して画像間オブジェクトレベルのコントラスト損失とトレーニング前の目的への画像内オブジェクトレベルの識別損失。（ii）生成された領域を適応的に拡張するカリキュラム学習メカニズムを提示します。これにより、モデルは、事前トレーニングの後の段階でも、有用な学習信号を一貫して取得できます。私たちの実験は、マルチオブジェクトシーン画像データセットで事前トレーニングを行うと、複数のオブジェクトレベルのタスクでMoCov2ベースラインが大幅に改善されることを示しています。コードはhttps://github.com/ChenhongyiYang/CCOPで入手できます。

The goal of contrastive learning based pre-training is to leverage large quantities of unlabeled data to produce a model that can be readily adapted downstream. Current approaches revolve around solving an image discrimination task: given an anchor image, an augmented counterpart of that image, and some other images, the model must produce representations such that the distance between the anchor and its counterpart is small, and the distances between the anchor and the other images are large. There are two significant problems with this approach: (i) by contrasting representations at the image-level, it is hard to generate detailed object-sensitive features that are beneficial to downstream object-level tasks such as instance segmentation; (ii) the augmentation strategy of producing an augmented counterpart is fixed, making learning less effective at the later stages of pre-training. In this work, we introduce Curricular Contrastive Object-level Pre-training (CCOP) to tackle these problems: (i) we use selective search to find rough object regions and use them to build an inter-image object-level contrastive loss and an intra-image object-level discrimination loss into our pre-training objective; (ii) we present a curriculum learning mechanism that adaptively augments the generated regions, which allows the model to consistently acquire a useful learning signal, even in the later stages of pre-training. Our experiments show that our approach improves on the MoCo v2 baseline by a large margin on multiple object-level tasks when pre-training on multi-object scene image datasets. Code is available at https://github.com/ChenhongyiYang/CCOP.

updated: Mon Nov 29 2021 14:01:12 GMT+0000 (UTC)

published: Fri Nov 26 2021 18:29:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト