Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection

Weixin Feng; Xingyuan Bu; Chenchen Zhang; Xubin Li

バウンディングボックスを超えて：オブジェクト検出のためのマルチモーダル知識学習

マルチモーダル監視は、多くの視覚言語理解タスクで有望な結果を達成しました。言語は、インスタンスを認識および特定するためのヒントまたはコンテキストとして重要な役割を果たします。ただし、人間が注釈を付けた言語コーパスの欠陥により、完全に監視されたオブジェクト検出シナリオでは、マルチモーダル監視は未踏のままです。本論文では、言語プロンプトを利用して、効果的で偏りのない言語監視をオブジェクト検出に導入し、言語監視から知識を学習するために必要なマルチモーダル知識学習（MKL）と呼ばれる新しいメカニズムを提案します。具体的には、プロンプトを設計し、バウンディングボックスの注釈で埋めて、インスタンスの認識とローカリゼーションのための広範なヒントとコンテキストを含む説明を生成します。次に、言語からの知識は、画像レベルとオブジェクトレベルの両方でクロスモーダル相互情報量を最大化することにより、検出モデルに抽出されます。さらに、生成された記述は、検出器のパフォーマンスをさらに高めるためにハードネガを生成するように操作されます。広範な実験は、提案された方法が1.6％〜2.1％の一貫したパフォーマンスの向上をもたらし、MS-COCOおよびOpenImagesデータセットで最先端を達成することを示しています。

Multimodal supervision has achieved promising results in many visual language understanding tasks, where the language plays an essential role as a hint or context for recognizing and locating instances. However, due to the defects of the human-annotated language corpus, multimodal supervision remains unexplored in fully supervised object detection scenarios. In this paper, we take advantage of language prompt to introduce effective and unbiased linguistic supervision into object detection, and propose a new mechanism called multimodal knowledge learning (MKL), which is required to learn knowledge from language supervision. Specifically, we design prompts and fill them with the bounding box annotations to generate descriptions containing extensive hints and context for instances recognition and localization. The knowledge from language is then distilled into the detection model via maximizing cross-modal mutual information in both image- and object-level. Moreover, the generated descriptions are manipulated to produce hard negatives to further boost the detector performance. Extensive experiments demonstrate that the proposed method yields a consistent performance gain by 1.6% ∼ 2.1% and achieves state-of-the-art on MS-COCO and OpenImages datasets.

updated: Mon May 09 2022 07:03:30 GMT+0000 (UTC)

published: Mon May 09 2022 07:03:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト