We present NOLBO, a variational observation model estimation for 3D multi-object from 2D single shot. Previous probabilistic instance-level understandings mainly consider the single-object image, not single shot with multi-object; relations between objects and the entire scene are out of their focus. The objectness of each observation also hardly join their model. Therefore, we propose a method to approximate the Bayesian observation model of scene-level 3D multi-object understanding. By exploiting variational auto-encoder (VAE), we estimate latent variables from the entire scene, which follow tractable distributions and concurrently imply 3D full shape and pose. To perform object-oriented data association and probabilistic simultaneous localization and mapping (SLAM), our observation models can easily be adopted to probabilistic inference by replacing object-oriented features with latent variables.