Million-scale Object Detection with Large Vision Model

Feng Lin; Wenze Hu; Yaowei Wang; Yonghong Tian; Guangming Lu; Fanglin Chen; Yong Xu; Xiaoyu Wang

ラージビジョンモデルによる百万規模の物体検出

ここ数年、広範で汎用的な汎用コンピュータービジョンシステムの開発が話題になっています。強力なユニバーサルシステムは、特定の問題や特定のデータドメインに制限されることなく、さまざまなビジョンタスクを同時に解決できます。これは、実際の現実世界のコンピュータービジョンアプリケーションで非常に重要です。この研究は、百万規模のマルチドメインユニバーサルオブジェクト検出問題に集中することにより、方向性を推し進めています。この問題は、データセットカテゴリ間のラベルの重複、ラベルの競合、および階層的な分類法の処理に関する複雑な性質のため、簡単ではありません。さらに、100 万規模のクロスデータセットオブジェクト検出のために、新しい大規模な事前トレーニング済みビジョンモデルを利用するためのリソース効率の高い方法は、未解決の課題のままです。このホワイトペーパーでは、ラベル処理、階層を意識した損失設計、事前トレーニング済みの大規模モデルを使用したリソース効率の高いモデルトレーニングのプラクティスを紹介することで、これらの課題に対処しようとします。当社の手法は、Robust Vision Challenge 2022 (RVC 2022) の物体検出トラックで 2 位にランクされています。私たちの詳細な研究が、コミュニティにおける同様の問題の代替実践パラダイムとして役立つことを願っています.コードは https://github.com/linfeng93/Large-UniDet で入手できます。

Over the past few years, developing a broad, universal, and general-purpose computer vision system has become a hot topic. A powerful universal system would be capable of solving diverse vision tasks simultaneously without being restricted to a specific problem or a specific data domain, which is of great importance in practical real-world computer vision applications. This study pushes the direction forward by concentrating on the million-scale multi-domain universal object detection problem. The problem is not trivial due to its complicated nature in terms of cross-dataset category label duplication, label conflicts, and the hierarchical taxonomy handling. Moreover, what is the resource-efficient way to utilize emerging large pre-trained vision models for million-scale cross-dataset object detection remains an open challenge. This paper tries to address these challenges by introducing our practices in label handling, hierarchy-aware loss design and resource-efficient model training with a pre-trained large model. Our method is ranked second in the object detection track of Robust Vision Challenge 2022 (RVC 2022). We hope our detailed study would serve as an alternative practice paradigm for similar problems in the community. The code is available at https://github.com/linfeng93/Large-UniDet.

updated: Mon Dec 19 2022 12:40:13 GMT+0000 (UTC)

published: Mon Dec 19 2022 12:40:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト