Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models

Sandro Costa Magalhães; Filipe Neves Santos; Pedro Machado; António Paulo Moreira; Jorge Dias

高速オブジェクト検出シングルショットマルチボックスディープラーニングモデルを使用したブドウの房と幹の検出のためのエッジコンピューティングデバイスのベンチマーク

目的: 視覚により、ロボットは環境を認識できます。視覚データは、コンピュータービジョンアルゴリズムを使用して処理されますが、これは通常時間がかかり、リアルタイムで視覚データを処理する強力なデバイスを必要とします。これは、エネルギーが限られているオープンフィールドロボットでは実行できません。この作業は、リアルタイムでオブジェクトを検出するためのさまざまな異種プラットフォームのパフォーマンスをベンチマークします。この調査では、組み込み GPU -- グラフィカルプロセッシングユニット (NVIDIA Jetson Nano 2 GB および 4 GB、NVIDIA Jetson TX2 など)、TPU -- Tensor プロセッシングユニット (Coral Dev Board TPU など)、および DPU -- の 3 つのアーキテクチャのベンチマークを行います。ディープラーニングプロセッサユニット (AMD-Xilinx ZCU104 開発ボード、AMD-Xilinx Kria KV260 スターターキットなど)。方法: 著者は、自然な VineSet データセットを使用して微調整された RetinaNet ResNet-50 を使用しました。トレーニング済みのモデルをターゲット固有のハードウェア形式に変換してコンパイルし、実行効率を向上させた後。結論と結果: プラットフォームは、評価指標のパフォーマンスと効率 (推論時間) の観点から評価されました。グラフィカルプロセッシングユニット (GPU) は 3 FPS から 5 FPS で動作する最も遅いデバイスであり、フィールドプログラマブルゲートアレイ (FPGA) は 14 FPS から 25 FPS で動作する最も高速なデバイスでした。 Tensor Processing Unit (TPU) の効率は関係なく、NVIDIA Jetson TX2 と同様です。 TPU と GPU は最も電力効率が高く、約 5W を消費します。評価指標におけるデバイス間のパフォーマンスの違いは無関係であり、F1 は約 70 %、平均精度 (mAP) は約 60 % です。

Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for target-specific hardware formats to improve the execution efficiency. Conclusions and Results: The platforms were assessed in terms of performance of the evaluation metrics and efficiency (time of inference). Graphical Processing Units (GPUs) were the slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays (FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson TX2. TPU and GPU are the most power-efficient, consuming about 5W. The performance differences, in the evaluation metrics, across devices are irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of about 60 %.

updated: Mon Nov 21 2022 17:02:33 GMT+0000 (UTC)

published: Mon Nov 21 2022 17:02:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト