2D-Empowered 3D Object Detection on the Edge

Jingzong Li; Yik Hong Cai; Libin Liu; Yu Mao; Chun Jason Xue; Hong Xu

エッジでの 2D 強化 3D オブジェクト検出

3D オブジェクト検出は、幅広いアプリケーション、特に自動運転とロボット工学で極めて重要な役割を果たします。これらのアプリケーションは通常、環境と迅速にやり取りするためにエッジデバイスに展開され、多くの場合、ほぼリアルタイムの応答を必要とします。計算能力が限られているため、非常に複雑なニューラルネットワークを使用してエッジで 3D 検出を実行することは困難です。クラウドへのオフロードなどの一般的なアプローチでは、転送中に大量の 3D ポイントクラウドデータが発生するため、レイテンシオーバーヘッドが発生します。脆弱なエッジデバイスと計算集約型の推論ワークロードの間の緊張関係を解決するために、高速な 2D 検出結果を変換して 3D バウンディングボックスを推定する可能性を探ります。この目的のために、私たちのアプローチの実現可能性と可能性を示す新しいシステムである Moby を紹介します。私たちの主な貢献は 2 つあります。まず、まったく同時にキャプチャされた LiDAR からのポイントクラウドデータとカメラからの 2D バウンディングボックスを入力として取り、3D バウンディングボックスを生成する 2D から 3D への変換パイプラインを設計します。 3D 検出器を実行することなく、前のフレームの検出結果に基づいて効率的かつ正確に。次に、2D から 3D への変換のエラーが特定のレベルに蓄積されたときに 3D 検出を動的に開始するフレームオフロードスケジューラを設計します。これにより、その後の変換で最新の 3D 検出結果をより正確に利用できるようになります。自動運転データセット KITTI と実世界の 4G/LTE トレースを使用した NVIDIA Jetson TX2 での広範な評価では、Moby がエンドツーエンドのレイテンシを最大 91.9% 削減し、ベースラインと比較して精度がわずかに低下することが示されています。さらに、Moby は、消費電力とメモリフットプリントをそれぞれ最大 75.7% と 48.1% 節約することで、優れたエネルギー効率を示します。

3D object detection has a pivotal role in a wide range of applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such as offloading to the cloud brings latency overheads due to the large amount of 3D point cloud data during transmission. To resolve the tension between wimpy edge devices and compute-intensive inference workloads, we explore the possibility of transforming fast 2D detection results to extrapolate 3D bounding boxes. To this end, we present Moby, a novel system that demonstrates the feasibility and potential of our approach. Our main contributions are two-fold: First, we design a 2D-to-3D transformation pipeline that takes as input the point cloud data from LiDAR and 2D bounding boxes from camera that are captured at exactly the same time, and generate 3D bounding boxes efficiently and accurately based on detection results of the previous frames without running 3D detectors. Second, we design a frame offloading scheduler that dynamically launches a 3D detection when the error of 2D-to-3D transformation accumulates to a certain level, so the subsequent transformations can draw upon the latest 3D detection results with better accuracy. Extensive evaluation on NVIDIA Jetson TX2 with the autonomous driving dataset KITTI and real-world 4G/LTE traces shows that, Moby reduces the end-to-end latency by up to 91.9% with mild accuracy drop compared to baselines. Further, Moby shows excellent energy efficiency by saving power consumption and memory footprint up to 75.7% and 48.1%, respectively.

updated: Sat Feb 18 2023 03:42:31 GMT+0000 (UTC)

published: Sat Feb 18 2023 03:42:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト