ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language KnowledgeDistillation

Johnathan Xie; Shuai Zheng

ZSD-YOLO：視覚言語KnowledgeDistillationを使用したゼロショットYOLO検出

実世界のオブジェクトサンプリングは、まれなタイプに対して指数関数的に多くの画像を必要とするロングテール分布を生成します。見えない物体を検出することを目的としたゼロショット検出は、この問題に対処するための1つの方向です。 COCOなどのデータセットは、多くの画像にわたって広範囲に注釈が付けられていますが、カテゴリの数が少なく、多様なドメイン全体のすべてのオブジェクトクラスに注釈を付けることは、費用がかかり、困難です。ゼロショット検出を進めるために、CLIPなどのゼロショット事前トレーニングモデルからの画像とテキストの埋め込みの両方を、YOLOv5などの1ステージ検出器からの修正されたセマンティック予測ヘッドに位置合わせするVision-Language蒸留法を開発します。この方法を使用すると、より少ないモデルパラメータでCOCOゼロショット検出分割で最先端の精度を実現するオブジェクト検出器をトレーニングできます。推論中に、追加のトレーニングなしで、任意の数のオブジェクトクラスを検出するようにモデルを適合させることができます。また、メソッドのスケーリングによって提供される改善は、さまざまなYOLOv5スケール間で一貫していることもわかりました。さらに、余分な画像やラベルを必要とせずに大幅なスコアの向上を提供するセルフトレーニング方法を開発します。

Real-world object sampling produces long-tailed distributions requiring exponentially more images for rare types. Zero-shot detection, which aims to detect unseen objects, is one direction to address this problem. A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging. To advance zero-shot detection, we develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5. With this method, we are able to train an object detector that achieves state-of-the-art accuracy on the COCO zero-shot detection splits with fewer model parameters. During inference, our model can be adapted to detect any number of object classes without additional training. We also find that the improvements provided by the scaling of our method are consistent across various YOLOv5 scales. Furthermore, we develop a self-training method that provides a significant score improvement without needing extra images nor labels.

updated: Fri Sep 24 2021 16:46:36 GMT+0000 (UTC)

published: Fri Sep 24 2021 16:46:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト