Unsupervised Pretraining for Object Detection by Patch Reidentification

Jian Ding; Enze Xie; Hang Xu; Chenhan Jiang; Zhenguo Li; Ping Luo; Gui-Song Xia

パッチ再識別によるオブジェクト検出のための教師なし事前トレーニング

教師なし表現学習は、オブジェクト検出器の事前トレーニング表現で有望なパフォーマンスを実現します。ただし、以前のアプローチは主に画像レベルの分類用に設計されているため、検出パフォーマンスが最適ではありません。パフォーマンスのギャップを埋めるために、この作業では、パッチ再識別（Re-ID）と呼ばれる、オブジェクト検出のためのシンプルで効果的な表現学習方法を提案します。これは、場所を区別する表現を監視なしで学習する対照的な口実タスクとして扱うことができ、魅力的です。対応するものと比較した利点。まず、異なるカメラビューで人間のアイデンティティと一致する完全に監視された人物のRe-IDとは異なり、パッチRe-IDは重要なパッチを疑似アイデンティティとして扱い、疑似アイデンティティの変換が異なる2つの異なる画像ビューでその対応を対照的に学習します。および変換により、オブジェクト検出の識別機能を学習できます。次に、パッチRe-IDは、オブジェクト検出にアピールするマルチレベル表現を学習するために、深く教師なしの方法で実行されます。第三に、広範な実験は、私たちの方法が、さまざまなトレーニングの反復やデータの割合など、すべての設定でCOCOの対応する方法を大幅に上回っていることを示しています。たとえば、私たちの表現で初期化されたマスクR-CNNは、トレーニング反復のすべてのセットアップでMoCo v2と、完全に監視された対応物を上回ります（たとえば、12kおよび90k反復でのMoCov2と比較して2.1および1.1mAPの改善）。コードはhttps://github.com/dingjiansw101/DUPRでリリースされます。

Unsupervised representation learning achieves promising performances in pre-training representations for object detectors. However, previous approaches are mainly designed for image-level classification, leading to suboptimal detection performance. To bridge the performance gap, this work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID), which can be treated as a contrastive pretext task to learn location-discriminative representation unsupervisedly, possessing appealing advantages compared to its counterparts. Firstly, unlike fully-supervised person Re-ID that matches a human identity in different camera views, patch Re-ID treats an important patch as a pseudo identity and contrastively learns its correspondence in two different image views, where the pseudo identity has different translations and transformations, enabling to learn discriminative features for object detection. Secondly, patch Re-ID is performed in Deeply Unsupervised manner to learn multi-level representations, appealing to object detection. Thirdly, extensive experiments show that our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages. For example, Mask R-CNN initialized with our representation surpasses MoCo v2 and even its fully-supervised counterparts in all setups of training iterations (e.g. 2.1 and 1.1 mAP improvement compared to MoCo v2 in 12k and 90k iterations respectively). Code will be released at https://github.com/dingjiansw101/DUPR.

updated: Mon Mar 08 2021 15:13:59 GMT+0000 (UTC)

published: Mon Mar 08 2021 15:13:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト