OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks

Sophia Sirko-Galouchenko; Alexandre Boulch; Spyros Gidaris; Andrei Bursuc; Antonin Vobecky; Patrick Pérez; Renaud Marlet

We introduce a self-supervised pretraining method, called OcFeat, for camera-only Bird's-Eye-View (BEV) segmentation networks. With OccFeat, we pretrain a BEV network via occupancy prediction and feature distillation tasks. Occupancy prediction provides a 3D geometric understanding of the scene to the model. However, the geometry learned is class-agnostic. Hence, we add semantic information to the model in the 3D space through distillation from a self-supervised pretrained image foundation model. Models pretrained with our method exhibit improved BEV semantic segmentation performance, particularly in low-data scenarios. Moreover, empirical results affirm the efficacy of integrating feature distillation with 3D occupancy prediction in our pretraining approach.

updated: Wed May 15 2024 09:16:03 GMT+0000 (UTC)

published: Mon Apr 22 2024 09:43:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト