MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector

#1 MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector [PDF⁴] [Copy] [Kimi¹] [REL]

Authors: Pan Liao, Feng Yang, Di Wu, Wenhui Zhao, Jinwen Yu

Monocular 3D object detection has vast application potential across various fields. DETR-type models have shown remarkable performance in different areas, but there is still considerable room for improvement in monocular 3D detection, especially with the existing DETR-based method, MonoDETR. After addressing the query initialization issues in MonoDETR, we explored several performance enhancement strategies, such as incorporating a more efficient encoder and utilizing a more powerful depth estimator. Ultimately, we proposed MonoDETRNext, a model that comes in two variants based on the choice of depth estimator: MonoDETRNext-E, which prioritizes speed, and MonoDETRNext-A, which focuses on accuracy. We posit that MonoDETRNext establishes a new benchmark in monocular 3D object detection and opens avenues for future research. We conducted an exhaustive evaluation demonstrating the model's superior performance against existing solutions. Notably, MonoDETRNext-A demonstrated a 3.52$\%$ improvement in the $AP_{3D}$ metric on the KITTI test benchmark over MonoDETR, while MonoDETRNext-E showed a 2.35$\%$ increase. Additionally, the computational efficiency of MonoDETRNext-E slightly exceeds that of its predecessor.

Subject: Computer Vision and Pattern Recognition

Publish: 2024-05-24 03:22:55 UTC

2405.15176

#1 MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector [PDF4] [Copy] [Kimi1] [REL]

#1 MonoDETRNext: Next-Generation Accurate and Efficient Monocular 3D Object Detector [PDF⁴] [Copy] [Kimi¹] [REL]