MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector

Pan Liao; Feng Yang; Di Wu; Wenhui Zhao; Jinwen Yu

Monocular 3D object detection has vast application potential across various fields. DETR-type models have shown remarkable performance in different areas, but there is still considerable room for improvement in monocular 3D detection, especially with the existing DETR-based method, MonoDETR. After addressing the query initialization issues in MonoDETR, we explored several performance enhancement strategies, such as incorporating a more efficient encoder and utilizing a more powerful depth estimator. Ultimately, we proposed MonoDETRNext, a model that comes in two variants based on the choice of depth estimator: MonoDETRNext-E, which prioritizes speed, and MonoDETRNext-A, which focuses on accuracy. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 3.52% improvement in the AP_3D metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-E showed a 2.35% increase. Additionally, the computational efficiency of MonoDETRNext-E slightly exceeds that of its predecessor.

updated: Wed Nov 27 2024 08:23:24 GMT+0000 (UTC)

published: Fri May 24 2024 03:22:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト