Learning to Discover and Detect Objects

Vladimir Fomenko; Ismail Elezi; Deva Ramanan; Laura Leal-Taixé; Aljoša Ošep

オブジェクトの発見と検出を学ぶ

新しいクラスの発見、検出、ローカリゼーション (NCDL) の問題に取り組みます。この設定では、一般的に観察されるクラスのオブジェクトのラベルを持つソースデータセットを想定しています。他のクラスのインスタンスは、人間の監督なしで、視覚的な類似性に基づいて自動的に検出、分類、ローカライズする必要があります。この目的のために、領域提案ネットワークを使用してオブジェクト候補をローカライズし、各候補を、ソースデータセット、または拡張された新しいクラスのセットの 1 つ。現実世界のクラスの自然な頻度を反映して、クラスの割り当てにロングテール分布の制約があります。この目的でエンドツーエンドの方法で検出ネットワークをトレーニングすることにより、ラベル付けされたオブジェクトクラスの語彙の一部ではないものを含む、多種多様なクラスのすべての領域提案を分類することを学習します。 COCO および LVIS データセットを使用して実施した実験では、従来のクラスタリングアルゴリズムに依存したり、事前に抽出された作物を使用したりするマルチステージパイプラインと比較して、この方法がはるかに効果的であることが明らかになりました。さらに、大規模な Visual Genome データセットにこの方法を適用することで、アプローチの一般性を示します。このデータセットでは、ネットワークが明示的な監視なしでさまざまなセマンティッククラスを検出することを学習します。

We tackle the problem of novel class discovery, detection, and localization (NCDL). In this setting, we assume a source dataset with labels for objects of commonly observed classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity, without human supervision. To this end, we propose a two-stage object detection network Region-based NCDL (RNCDL), that uses a region proposal network to localize object candidates and is trained to classify each candidate, either as one of the known classes, seen in the source dataset, or one of the extended set of novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those that are not part of the labeled object class vocabulary. Our experiments conducted using COCO and LVIS datasets reveal that our method is significantly more effective compared to multi-stage pipelines that rely on traditional clustering algorithms or use pre-extracted crops. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale Visual Genome dataset, where our network successfully learns to detect various semantic classes without explicit supervision.

updated: Wed Oct 19 2022 17:59:55 GMT+0000 (UTC)

published: Wed Oct 19 2022 17:59:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト