Detecting soccer balls with reduced neural networks: a comparison of multiple architectures under constrained hardware scenarios

Douglas De Rizzo Meneghetti; Thiago Pedro Donadon Homem; Jonas Henrique Renolfi de Oliveira; Isaac Jesus da Silva; Danilo Hernani Perico; Reinaldo Augusto da Costa Bianchi

ニューラルネットワークを減らしたサッカーボールの検出：制約のあるハードウェアシナリオでの複数のアーキテクチャの比較

最先端の検出精度を実現するオブジェクト検出技術は、グラフィックスプロセッシングユニットで最適なパフォーマンスを発揮するように実装された畳み込みニューラルネットワークを採用しています。モバイルロボットなどの一部のハードウェアシステムは、制約のあるハードウェア状況で動作しますが、それでもオブジェクト検出機能の恩恵を受けます。複数のネットワークモデルが提案されており、アーキテクチャの削減と運用のスリム化で同等の精度を実現しています。移動ロボットのサッカーチーム用のオブジェクト検出システムを作成する必要性に動機付けられて、この作業は、サッカーボール検出の特定のタスクで、制約のあるハードウェア環境を対象としたニューラルネットワークの最近の提案の比較研究を提供します。モバイルロボットを使用してキャプチャされた注釈付き画像データセットで、基盤となるアーキテクチャが異なるMobileNetV2モデルとMobileNetV3モデル、およびYOLOv3、TinyYOLOv3、YOLOv4、TinyYOLOv4の複数のオープン実装をトレーニングします。次に、制約付きおよび制約なしのハードウェア構成で、テストデータセットの平均平均精度とさまざまな解像度のビデオでの推論時間を報告します。結果は、MobileNetV3モデルは、制約のあるシナリオでのみmAPと推論時間の間で適切なトレードオフを持っているのに対し、高幅の乗数を備えたMobileNetV2はサーバー側の推論に適していることを示しています。公式実装のYOLOモデルは、CPUでの推論には適していません。

Object detection techniques that achieve state-of-the-art detection accuracy employ convolutional neural networks, implemented to have optimal performance in graphics processing units. Some hardware systems, such as mobile robots, operate under constrained hardware situations, but still benefit from object detection capabilities. Multiple network models have been proposed, achieving comparable accuracy with reduced architectures and leaner operations. Motivated by the need to create an object detection system for a soccer team of mobile robots, this work provides a comparative study of recent proposals of neural networks targeted towards constrained hardware environments, in the specific task of soccer ball detection. We train multiple open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures, as well as YOLOv3, TinyYOLOv3, YOLOv4 and TinyYOLOv4 in an annotated image data set captured using a mobile robot. We then report their mean average precision on a test data set and their inference times in videos of different resolutions, under constrained and unconstrained hardware configurations. Results show that MobileNetV3 models have a good trade-off between mAP and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference. YOLO models in their official implementations are not suitable for inference in CPUs.

updated: Sun Feb 21 2021 12:15:09 GMT+0000 (UTC)

published: Mon Sep 28 2020 23:26:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト