Balancing Performance and Efficiency in Zero-shot Robotic Navigation
We present an optimization study of the Vision-Language Frontier Maps (VLFM) applied to the Object Goal Navigation task in robotics. Our work evaluates the efficiency and performance of various vision-language models, object detectors, segmentation models, and multi-modal comprehension and Visual Question Answering modules. Using the val-mini and val splits of Habitat-Matterport 3D dataset, we conduct experiments on a desktop with limited VRAM. We propose a solution that achieves a higher success rate (+1.55%) improving over the VLFM BLIP-2 baseline without substantial success-weighted path length loss while requiring 2.3 times less video memory. Our findings provide insights into balancing model performance and computational efficiency, suggesting effective deployment strategies for resource-limited environments.
updated: Wed Jun 05 2024 07:31:05 GMT+0000 (UTC)
published: Wed Jun 05 2024 07:31:05 GMT+0000 (UTC)
