Real-time semantic segmentation on FPGAs for autonomous vehicles with hls4ml

Nicolò Ghielmetti; Vladimir Loncar; Maurizio Pierini; Marcel Roed; Sioni Summers; Thea Aarrestad; Christoffer Petersson; Hampus Linander; Jennifer Ngadiuba; Kelvin Lin; Philip Harris

hls4mlを使用した自動運転車用のFPGAでのリアルタイムセマンティックセグメンテーション

この論文では、フィールドプログラマブルゲートアレイが自動運転に関連するリアルタイムセマンティックセグメンテーションタスクのハードウェアアクセラレータとしてどのように機能するかを調査します。 ENet畳み込みニューラルネットワークアーキテクチャの圧縮バージョンを考慮して、ザイリンクスZCU102評価ボードで利用可能なリソースの30％未満を使用して、イメージあたり4.9ミリ秒の遅延を伴う完全オンチップ展開を示します。自動運転車が複数のカメラから同時に入力を受信するユースケースに対応して、バッチサイズを10に増やすと、遅延は画像ごとに3ミリ秒に短縮されます。積極的なフィルター削減と異種量子化対応トレーニング、および畳み込み層の最適化された実装を通じて、Cityscapesデータセットの精度を維持しながら、電力消費とリソース使用率を大幅に削減できることを示します。

In this paper, we investigate how field programmable gate arrays can serve as hardware accelerators for real-time semantic segmentation tasks relevant for autonomous driving. Considering compressed versions of the ENet convolutional neural network architecture, we demonstrate a fully-on-chip deployment with a latency of 4.9 ms per image, using less than 30% of the available resources on a Xilinx ZCU102 evaluation board. The latency is reduced to 3 ms per image when increasing the batch size to ten, corresponding to the use case where the autonomous vehicle receives inputs from multiple cameras simultaneously. We show, through aggressive filter reduction and heterogeneous quantization-aware training, and an optimized implementation of convolutional layers, that the power consumption and resource utilization can be significantly reduced while maintaining accuracy on the Cityscapes dataset.

updated: Mon May 16 2022 13:55:16 GMT+0000 (UTC)

published: Mon May 16 2022 13:55:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト