Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Rayson Laroca; Valter Estevam; Alceu S. Britto Jr.; Rodrigo Minetto; David Menotti

テストデータでトレーニングしますか?ナンバープレート認識に対するほぼ重複の影響

この作業は、ナンバープレート認識 (LPR) の研究で広く採用されているデータセットのトレーニングセットとテストセットのほぼ重複の大部分に注目しています。これらの複製は、異なるものの、同じナンバープレートを示す画像を参照します。フィールドで最も人気のある 2 つのデータセットで実施された私たちの実験では、6 つのよく知られたモデルがトレーニングされ、公正な分割の下でテストされた場合、つまり、トレーニングセットとテストセットに重複がない場合に、認識率が大幅に低下することが示されました。さらに、データセットの 1 つでは、モデルが重複のない分割でトレーニングおよびテストされたときに、モデルのランキングが大幅に変化しました。これらの調査結果は、そのような重複が LPR の深層学習ベースのモデルの評価と開発に大きな偏りをもたらしたことを示唆しています。私たちが見つけたほぼ重複のリストと公平な分割の提案は、https://raysonlaroca.github.io/supp/lpr-train-on-test/ で公開されています。

This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at https://raysonlaroca.github.io/supp/lpr-train-on-test/

updated: Mon Apr 10 2023 15:24:29 GMT+0000 (UTC)

published: Mon Apr 10 2023 15:24:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト