CLIP the Gap: A Single Domain Generalization Approach for Object Detection

Vidit Vidit; Martin Engilberge; Mathieu Salzmann

CLIP the Gap: オブジェクト検出のための単一ドメイン一般化アプローチ

単一ドメインの一般化 (SDG) は、単一のソースドメインでモデルをトレーニングして、目に見えないターゲットドメインに一般化するという問題に取り組みます。これは画像分類について十分に研究されていますが、SDG オブジェクト検出に関する文献はほとんど存在しないままです。堅牢なオブジェクトのローカリゼーションと表現を同時に学習するという課題に対処するために、事前にトレーニングされたビジョン言語モデルを活用して、テキストプロンプトを介してセマンティックドメインの概念を導入することを提案します。これは、検出器バックボーンによって抽出された特徴に作用するセマンティック拡張戦略と、テキストベースの分類損失によって実現されます。私たちの実験は、独自の多様な気象駆動ベンチマークで、唯一の既存の SDG オブジェクト検出方法である Single-DGOD [49] を 10% 上回る、私たちのアプローチの利点を証明しています。

Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD [49], on their own diverse weather-driving benchmark.

updated: Mon Mar 06 2023 13:35:22 GMT+0000 (UTC)

published: Fri Jan 13 2023 12:01:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト