Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform of Autonomous System

Issac Sim; Ju-Hyung Lim; Young-Wan Jang; JiHwan You; SeonTaek Oh; Young-Keun Kim

自律システムの組み込みGPUプラットフォームにデプロイするためのYOLOv4に基づく圧縮オブジェクト検出モデルの開発

最新のCNNベースのオブジェクト検出モデルは非常に正確ですが、リアルタイムで実行するには高性能GPUが必要です。限られたメモリスペースを持つ組み込みシステムの場合、メモリサイズと速度の点で依然として重いです。自律システムの物体検出は組み込みプロセッサ上で実行されるため、検出精度を維持しながら、検出ネットワークを可能な限り軽く圧縮することが望ましい。いくつかの人気のある軽量検出モデルがありますが、それらの精度は安全運転アプリケーションには低すぎます。そのため、本論文では、自律システム上でのリアルタイムで安全な運転アプリケーションの精度低下を最小限に抑えながら、高い比率で圧縮されるYOffleNetと呼ばれる新しい物体検出モデルを提案します。バックボーンネットワークアーキテクチャはYOLOv4に基づいていますが、計算負荷の高いCSP DenseNetをShuffleNetの軽量モジュールに置き換えることで、ネットワークを大幅に圧縮できます。 KITTIデータセットを使用した実験では、提案されたYOffleNetは、組み込みGPUシステム（NVIDIA Jetson AGX Xavier）で46FPSを達成できるYOLOv4-sよりも4.7倍圧縮されていることが示されました。高い圧縮率と比較すると、精度はわずかに低下して85.8％mAPになります。これは、YOLOv4-sよりも2.6％低いだけです。したがって、提案されたネットワークは、リアルタイムで正確なオブジェクト検出アプリケーションのために、自律システムの組み込みシステムに展開される可能性が高いことを示しました。

Latest CNN-based object detection models are quite accurate but require a high-performance GPU to run in real-time. They still are heavy in terms of memory size and speed for an embedded system with limited memory space. Since the object detection for autonomous system is run on an embedded processor, it is preferable to compress the detection network as light as possible while preserving the detection accuracy. There are several popular lightweight detection models but their accuracy is too low for safe driving applications. Therefore, this paper proposes a new object detection model, referred as YOffleNet, which is compressed at a high ratio while minimizing the accuracy loss for real-time and safe driving application on an autonomous system. The backbone network architecture is based on YOLOv4, but we could compress the network greatly by replacing the high-calculation-load CSP DenseNet with the lighter modules of ShuffleNet. Experiments with KITTI dataset showed that the proposed YOffleNet is compressed by 4.7 times than the YOLOv4-s that could achieve as fast as 46 FPS on an embedded GPU system(NVIDIA Jetson AGX Xavier). Compared to the high compression ratio, the accuracy is reduced slightly to 85.8% mAP, that is only 2.6% lower than YOLOv4-s. Thus, the proposed network showed a high potential to be deployed on the embedded system of the autonomous system for the real-time and accurate object detection applications.

updated: Sun Aug 01 2021 08:19:51 GMT+0000 (UTC)

published: Sun Aug 01 2021 08:19:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト