Detecting the open-world objects with the help of the Brain

Shuailei Ma; Yuefeng Wang; Ying Wei; Peihao Chen; Zhixiang Ye; Jiaqi Fan; Enming Zhang; Thomas H. Li

ブレインの助けを借りてオープンワールドのオブジェクトを検出する

Open World Object Detection (OWOD) は、従来のオブジェクト検出 (OD) ベンチマークと実際のオブジェクト検出の間のギャップを埋める、かなりの課題を伴う新しいコンピュータービジョンタスクです。 OWOD アルゴリズムは、目に見える/既知のオブジェクトを検出して分類するだけでなく、目に見えない/未知のオブジェクトを検出し、それらを段階的に学習することが期待されています。環境内の未知のオブジェクトを識別する人間の自然な本能は、主に脳の知識ベースに依存しています。いくつかの小さなデータセットの注釈から学習するだけでは、モデルがこれを行うことは困難です。大規模な事前トレーニング済みの接地言語イメージモデル - VL (つまり GLIP) は、オープンワールドに関する豊富な知識を持っていますが、テキストプロンプトに限定されています。未知のラベルを生成するだけで、VL をオープンワールド検出器の「頭脳」として活用することを提案します。未知のラベルはモデルの既知のオブジェクトの学習を損なうため、それを活用することは自明ではありません。この論文では、ダウンウェイトロス機能と分離された検出構造を提案することにより、これらの問題を軽減します。さらに、私たちの検出器は「脳」を活用して、疑似ラベリングスキームを通じて VL を超えた新しいオブジェクトを学習します。

Open World Object Detection (OWOD) is a novel computer vision task with a considerable challenge, bridging the gap between classic object detection (OD) benchmarks and real-world object detection. In addition to detecting and classifying seen/known objects, OWOD algorithms are expected to detect unseen/unknown objects and incrementally learn them. The natural instinct of humans to identify unknown objects in their environments mainly depends on their brains' knowledge base. It is difficult for a model to do this only by learning from the annotation of several tiny datasets. The large pre-trained grounded language-image models - VL (i.e. GLIP) have rich knowledge about the open world but are limited to the text prompt. We propose leveraging the VL as the ``Brain'' of the open-world detector by simply generating unknown labels. Leveraging it is non-trivial because the unknown labels impair the model's learning of known objects. In this paper, we alleviate these problems by proposing the down-weight loss function and decoupled detection structure. Moreover, our detector leverages the ``Brain'' to learn novel objects beyond VL through our pseudo-labeling scheme.

updated: Tue Mar 21 2023 06:44:02 GMT+0000 (UTC)

published: Tue Mar 21 2023 06:44:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト