Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

Jialian Wu; Liangchen Song; Tiancai Wang; Qian Zhang; Junsong Yuan

フォレストR-CNN：大語彙のロングテールオブジェクトの検出とインスタンスのセグメンテーション

以前のオブジェクト分析の成功にもかかわらず、データの長い分布を持つ多数のオブジェクトカテゴリを検出およびセグメント化することは、依然として困難な問題であり、調査はあまり進んでいません。大きな語彙の分類子の場合、ノイズの多いロジットを取得する可能性がはるかに高くなり、誤った認識につながる可能性があります。このホワイトペーパーでは、オブジェクトカテゴリ間の関係に関する事前知識を利用して、細かいクラスをより粗い親クラスにクラスタ化し、親クラスを介してオブジェクトインスタンスを細かいカテゴリに解析する分類ツリーを構築します。分類ツリーでは、親クラスノードの数が大幅に少ないため、ロジットのノイズが少なく、きめの細かいクラスノードに存在する誤った/ノイズの多いロジットを抑制するために利用できます。親クラスを構築する方法は一意ではないため、複数のツリーをさらに構築して分類フォレストを形成し、各ツリーがきめの細かい分類への投票に貢献します。ロングテール現象によって引き起こされる不均衡な学習を軽減するために、シンプルで効果的なリサンプリング方法であるNMSリサンプリングを提案して、データ分布のバランスを再調整します。 Forest R-CNNと呼ばれる私たちの方法は、1000を超えるカテゴリを認識するためのほとんどのオブジェクト認識モデルに適用されるプラグアンドプレイモジュールとして機能します。大規模な語彙データセットLVISに対して広範な実験が行われます。マスクR-CNNベースラインと比較して、フォレストR-CNNはパフォーマンスを大幅に向上させ、レアカテゴリと全体的なカテゴリでAPをそれぞれ11.5％と3.9％向上させます。さらに、LVISデータセットで最先端の結果を達成しています。コードはhttps://github.com/JialianW/Forest_RCNNで入手できます。

Despite the previous success of object analysis, detecting and segmenting a large number of object categories with a long-tailed data distribution remains a challenging problem and is less investigated. For a large-vocabulary classifier, the chance of obtaining noisy logits is much higher, which can easily lead to a wrong recognition. In this paper, we exploit prior knowledge of the relations among object categories to cluster fine-grained classes into coarser parent classes, and construct a classification tree that is responsible for parsing an object instance into a fine-grained category via its parent class. In the classification tree, as the number of parent class nodes are significantly less, their logits are less noisy and can be utilized to suppress the wrong/noisy logits existed in the fine-grained class nodes. As the way to construct the parent class is not unique, we further build multiple trees to form a classification forest where each tree contributes its vote to the fine-grained classification. To alleviate the imbalanced learning caused by the long-tail phenomena, we propose a simple yet effective resampling method, NMS Resampling, to re-balance the data distribution. Our method, termed as Forest R-CNN, can serve as a plug-and-play module being applied to most object recognition models for recognizing more than 1000 categories. Extensive experiments are performed on the large vocabulary dataset LVIS. Compared with the Mask R-CNN baseline, the Forest R-CNN significantly boosts the performance with 11.5% and 3.9% AP improvements on the rare categories and overall categories, respectively. Moreover, we achieve state-of-the-art results on the LVIS dataset. Code is available at https://github.com/JialianW/Forest_RCNN.

updated: Wed Mar 03 2021 04:51:39 GMT+0000 (UTC)

published: Thu Aug 13 2020 03:52:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト