QuickBrowser: A Unified Model to Detect and Read Simple Object in Real-time

Thao Do; Daeyoung Kim

QuickBrowser：単純なオブジェクトをリアルタイムで検出して読み取るための統合モデル

バーコードスキャンやビルボード読み取りなど、オブジェクトを検出してオブジェクトの内容を読み取る必要がある実際のユースケースは数多くあります。一般的に既存のメソッドは、最初にオブジェクト領域をローカライズしようとし、次にレイアウトを決定し、最後にコンテンツユニットを分類します。ただし、ナンバープレートのような単純な固定構造のオブジェクトの場合、このアプローチはやり過ぎで実行に時間がかかります。この作業は、複数桁の認識を1段階のオブジェクト検出モデルに統合することにより、この検出と読み取りの問題を軽量な方法で解決することを目的としています。私たちの統一された方法は、特徴抽出の重複を排除するだけでなく（1つはローカライズ用、もう1つは分類用）、分類のためにオブジェクト領域周辺の有用なコンテキスト情報も提供します。さらに、バックボーンの選択とアーキテクチャ、損失関数、データの拡張、トレーニングの変更により、メソッドは堅牢で効率的かつ迅速になります。次に、信頼性の高い評価のために、さまざまな実際の1Dバーコードの公開ベンチマークデータセットを作成し、収集、注釈付け、慎重にチェックしました。最終的に、実験結果は、VGAと同様の解像度でリアルタイムfpsを使用してレートを検出およびデコードする際に、産業用ツールよりも優れたパフォーマンスを発揮することにより、バーコード問題に対するメソッドの効率を証明します。また、認識率と推論時間の点で現在の最先端の方法を大幅に上回って、ナンバープレート認識タスク（AOLPデータセット上）で期待どおりに素晴らしい仕事をしました。

There are many real-life use cases such as barcode scanning or billboard reading where people need to detect objects and read the object contents. Commonly existing methods are first trying to localize object regions, then determine layout and lastly classify content units. However, for simple fixed structured objects like license plates, this approach becomes overkill and lengthy to run. This work aims to solve this detect-and-read problem in a lightweight way by integrating multi-digit recognition into a one-stage object detection model. Our unified method not only eliminates the duplication in feature extraction (one for localizing, one again for classifying) but also provides useful contextual information around object regions for classification. Additionally, our choice of backbones and modifications in architecture, loss function, data augmentation and training make the method robust, efficient and speedy. Secondly, we made a public benchmark dataset of diverse real-life 1D barcodes for a reliable evaluation, which we collected, annotated and checked carefully. Eventually, experimental results prove the method's efficiency on the barcode problem by outperforming industrial tools in both detecting and decoding rates with a real-time fps at a VGA-similar resolution. It also did a great job expectedly on the license-plate recognition task (on the AOLP dataset) by outperforming the current state-of-the-art method significantly in terms of recognition rate and inference time.

updated: Sun Jun 27 2021 08:52:06 GMT+0000 (UTC)

published: Mon Feb 15 2021 05:47:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト