Group R-CNN for Weakly Semi-supervised Object Detection with Points

Shilong Zhang; Zhuoran Yu; Liyang Liu; Xinjiang Wang; Aojun Zhou; Kai Chen

ポイントを使用した弱く半教師ありオブジェクト検出のためのグループR-CNN

ポイントを使用した弱半教師ありオブジェクト検出（WSSOD-P）の問題を調査します。ここで、トレーニングデータは、バウンディングボックスを備えた完全に注釈が付けられた画像の小さなセットと、単一のポイントのみを備えた弱くラベル付けされた画像の大きなセットによって結合されます。インスタンスごとに注釈が付けられます。このタスクの中核は、各ポイント注釈の信頼できるバウンディングボックスを予測するために使用できる、適切にラベル付けされた画像でポイントツーボックスリグレッサをトレーニングすることです。既存のCNNベースの検出器はこのタスクと互換性がないという以前の信念に異議を唱えます。従来のR-CNNアーキテクチャに基づいて、効果的なポイントツーボックスリグレッサであるグループR-CNNを提案します。グループR-CNNは、最初にインスタンスレベルのプロポーザルグループ化を使用して、各ポイントアノテーションのプロポーザルのグループを生成するため、高いリコール率を得ることができます。さまざまなインスタンスをより適切に区別し、精度を向上させるために、元のR-CNNメソッドで採用されたバニラ割り当て戦略を置き換えるインスタンスレベルの提案割り当てを提案します。ナイーブなインスタンスレベルの割り当ては収束の難しさをもたらすため、この問題を克服するために、インスタンス対応の機能拡張とインスタンス対応のパラメーター生成で構成されるインスタンス対応の表現学習を提案します。 MS-COCOベンチマークに関する包括的な実験は、私たちの方法の有効性を示しています。具体的には、グループR-CNNは、最も困難なシナリオである5％の適切にラベル付けされた画像を使用して、以前の方法であるPointDETRを3.9mAP上回っています。ソースコードはhttps://github.com/jshilong/GroupRCNNにあります

We study the problem of weakly semi-supervised object detection with points (WSSOD-P), where the training data is combined by a small set of fully annotated images with bounding boxes and a large set of weakly-labeled images with only a single point annotated for each instance. The core of this task is to train a point-to-box regressor on well-labeled images that can be used to predict credible bounding boxes for each point annotation. We challenge the prior belief that existing CNN-based detectors are not compatible with this task. Based on the classic R-CNN architecture, we propose an effective point-to-box regressor: Group R-CNN. Group R-CNN first uses instance-level proposal grouping to generate a group of proposals for each point annotation and thus can obtain a high recall rate. To better distinguish different instances and improve precision, we propose instance-level proposal assignment to replace the vanilla assignment strategy adopted in the original R-CNN methods. As naive instance-level assignment brings converging difficulty, we propose instance-aware representation learning which consists of instance-aware feature enhancement and instance-aware parameter generation to overcome this issue. Comprehensive experiments on the MS-COCO benchmark demonstrate the effectiveness of our method. Specifically, Group R-CNN significantly outperforms the prior method Point DETR by 3.9 mAP with 5% well-labeled images, which is the most challenging scenario. The source code can be found at https://github.com/jshilong/GroupRCNN

updated: Thu May 12 2022 07:17:54 GMT+0000 (UTC)

published: Thu May 12 2022 07:17:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト