High-temporal-resolution event-based vehicle detection and tracking

Zaid El-Shair; Samir Rawashdeh

時間分解能の高いイベントベースの車両検出と追跡

イベントベースのビジョンは、高い時間分解能 (~1us)、広いダイナミックレンジ (>120dB)、わずか数マイクロ秒の出力遅延などの独自の特性によって正当化され、近年急速に成長しています。この作業では、オブジェクトの検出と追跡のためのハイブリッドでマルチモーダルなアプローチをさらに調査します。これは、手作りのイベントベースの方法で補完された最先端のフレームベースの検出器を活用して、最小限の計算オーバーヘッドで全体的な追跡パフォーマンスを向上させます。提示された方法には、結果として得られる BB の精度を向上させるイベントベースのバウンディングボックス (BB) の改良、および継続的なイベントベースのオブジェクト検出方法が含まれており、見逃した検出を回復し、フレーム間検出を生成して、高い時間的検出を可能にします。 -解像度追跡出力。これらの方法の利点は、高次追跡精度 (HOTA) メトリックを使用したアブレーション研究によって定量的に検証されます。結果は、24Hz のベースラインフレームレートで、提案された 2 つの方法と組み合わせたイベントおよびエッジベースのマスク構成で、フレームのみを使用した場合の 56.6% から 64.1% および 64.9% への HOTA の改善に類似した大幅なパフォーマンスの向上を示しています。同様に、これらの方法を同じ構成に組み込むことで、HOTA が 52.5% から 63.1% に、384Hz の高時間分解能トラッキングレートで 51.3% から 60.2% に改善されました。最後に、高速 LiDAR を使用した実世界の単一オブジェクト追跡パフォーマンスを分析するための検証実験が行われます。経験的な証拠は、24 Hz のベースラインフレームレートと最大 500 Hz の高いトラッキングレートでフレームベースのオブジェクト検出器を使用する場合と比較して、私たちのアプローチが大きな利点を提供することを示しています。

Event-based vision has been rapidly growing in recent years justified by the unique characteristics it presents such as its high temporal resolutions (~1us), high dynamic range (>120dB), and output latency of only a few microseconds. This work further explores a hybrid, multi-modal, approach for object detection and tracking that leverages state-of-the-art frame-based detectors complemented by hand-crafted event-based methods to improve the overall tracking performance with minimal computational overhead. The methods presented include event-based bounding box (BB) refinement that improves the precision of the resulting BBs, as well as a continuous event-based object detection method, to recover missed detections and generate inter-frame detections that enable a high-temporal-resolution tracking output. The advantages of these methods are quantitatively verified by an ablation study using the higher order tracking accuracy (HOTA) metric. Results show significant performance gains resembled by an improvement in the HOTA from 56.6%, using only frames, to 64.1% and 64.9%, for the event and edge-based mask configurations combined with the two methods proposed, at the baseline framerate of 24Hz. Likewise, incorporating these methods with the same configurations has improved HOTA from 52.5% to 63.1%, and from 51.3% to 60.2% at the high-temporal-resolution tracking rate of 384Hz. Finally, a validation experiment is conducted to analyze the real-world single-object tracking performance using high-speed LiDAR. Empirical evidence shows that our approaches provide significant advantages compared to using frame-based object detectors at the baseline framerate of 24Hz and higher tracking rates of up to 500Hz.

updated: Thu Dec 29 2022 12:45:06 GMT+0000 (UTC)

published: Thu Dec 29 2022 12:45:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト