Synthetic dataset generation for object-to-model deep learning in   industrial applications

Matthew Z. Wong; Kiyohito Kunii; Max Baylis; Wai Hong Ong; Pavel Kroupa; Swen Koller

産業用アプリケーションでのオブジェクトからモデルへの深層学習のための合成データセットの生成

Synthetic dataset generation for object-to-model deep learning in industrial applications

大規模な画像データセットの可用性は、深層学習ベースの分類および検出方法を成功させるための重要な要素です。日常のオブジェクトのデータセットは広く利用できますが、特定の産業ユースケース（倉庫内のパッケージ製品の識別など）のデータは不足しています。そのような場合、データセットはゼロから作成する必要があり、産業用アプリケーションでのディープラーニング技術の展開に重大なボトルネックを置きます。倉庫の設定でユニークなスーパーマーケット製品を検出および識別することができるコンピュータービジョンシステムを作成することを目的に、英国の大手オンラインスーパーマーケットと共同で実施した作業を紹介します。この目的のために、合成データを使用してエンドツーエンドのディープラーニングパイプラインを作成するためのフレームワークを示します。これは、現実世界のオブジェクトから始まり、訓練されたモデルに至るまでです。私たちの方法は、写真測量技術を実世界のオブジェクトに適用することによって得られた3Dモデルからの合成データセットの生成に基づいています。クラスごとに60個の実画像から生成された100kの合成画像を使用して、InceptionV3畳み込みニューラルネットワーク（CNN）をトレーニングし、個別に取得した実際のスーパーマーケット製品画像のテストセットで95.8％の分類精度を達成しました。画像生成プロセスは、自動ピクセル注釈をサポートしています。これにより、検出タスクに通常必要とされる非常に高価な手動注釈が不要になります。この容易に入手可能なデータに基づいて、1段のRetinaNet検出器を合成注釈付き画像でトレーニングし、リアルタイムで標本生成物を正確に位置特定および分類できる検出器を作成しました。

The availability of large image data sets has been a crucial factor in the success of deep learning-based classification and detection methods. While data sets for everyday objects are widely available, data for specific industrial use-cases (e.g. identifying packaged products in a warehouse) remains scarce. In such cases, the data sets have to be created from scratch, placing a crucial bottleneck on the deployment of deep learning techniques in industrial applications. We present work carried out in collaboration with a leading UK online supermarket, with the aim of creating a computer vision system capable of detecting and identifying unique supermarket products in a warehouse setting. To this end, we demonstrate a framework for using synthetic data to create an end-to-end deep learning pipeline, beginning with real-world objects and culminating in a trained model. Our method is based on the generation of a synthetic dataset from 3D models obtained by applying photogrammetry techniques to real-world objects. Using 100k synthetic images generated from 60 real images per class, an InceptionV3 convolutional neural network (CNN) was trained, which achieved classification accuracy of 95.8% on a separately acquired test set of real supermarket product images. The image generation process supports automatic pixel annotation. This eliminates the prohibitively expensive manual annotation typically required for detection tasks. Based on this readily available data, a one-stage RetinaNet detector was trained on the synthetic, annotated images to produce a detector that can accurately localize and classify the specimen products in real-time.

updated: Tue Sep 24 2019 14:58:07 GMT+0000 (UTC)

published: Tue Sep 24 2019 14:58:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト