Unitail: Detecting, Reading, and Matching in Retail Scene

Fangyi Chen; Han Zhang; Zaiwang Li; Jiachen Dou; Shentong Mo; Hao Chen; Yongxin Zhang; Uzair Ahmed; Chenchen Zhu; Marios Savvides

ユニテイル：小売シーンでの検出、読み取り、マッチング

店舗でコンピュータビジョン技術を最大限に活用するには、小売シーンの特性に合った実際のニーズを考慮する必要があります。この目標を追求するために、検出、読み取り、照合のアルゴリズムに挑戦する製品の基本的な視覚的タスクの大規模なベンチマークであるUnited Retail Datasets（Unitail）を紹介します。 180万個の四辺形のインスタンスに注釈が付けられているため、Unitailは、製品の外観をより適切に調整するための検出データセットを提供します。さらに、1454の製品カテゴリ、30kのテキスト領域、および21kの文字起こしを含む、ギャラリースタイルのOCRデータセットを提供して、製品の堅牢な読み取りを可能にし、製品のマッチングを強化します。さまざまな最先端技術を使用してデータセットのベンチマークを行うだけでなく、製品検出用の新しい検出器をカスタマイズし、その有効性を検証するシンプルなOCRベースのマッチングソリューションを提供します。

To make full use of computer vision technology in stores, it is required to consider the actual needs that fit the characteristics of the retail scene. Pursuing this goal, we introduce the United Retail Datasets (Unitail), a large-scale benchmark of basic visual tasks on products that challenges algorithms for detecting, reading, and matching. With 1.8M quadrilateral-shaped instances annotated, the Unitail offers a detection dataset to align product appearance better. Furthermore, it provides a gallery-style OCR dataset containing 1454 product categories, 30k text regions, and 21k transcriptions to enable robust reading on products and motivate enhanced product matching. Besides benchmarking the datasets using various state-of-the-arts, we customize a new detector for product detection and provide a simple OCR-based matching solution that verifies its effectiveness.

updated: Sun Jul 10 2022 07:13:58 GMT+0000 (UTC)

published: Fri Apr 01 2022 09:06:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト