Nighttime stereo depth estimation is still challenging, as assumptions associated with daytime lighting conditions do not hold any longer. Nighttime is not only about low-light and dense noise, but also about glow/glare, flares, non-uniform distribution of light, etc. One of the possible solutions is to train a network on night stereo images in a fully supervised manner. However, to obtain proper disparity ground-truths that are dense, independent from glare/glow, and have sufficiently far depth ranges is extremely intractable. To address the problem, we introduce a network joining day/night translation and stereo. In training the network, our method does not require ground-truth disparities of the night images, or paired day/night images. We utilize a translation network that can render realistic night stereo images from day stereo images. We then train a stereo network on the rendered night stereo images using the available disparity supervision from the corresponding day stereo images, and simultaneously also train the day/night translation network. We handle the fake depth problem, which occurs due to the unsupervised/unpaired translation, for light effects (e.g., glow/glare) and uninformative regions (e.g., low-light and saturated regions), by adding structure-preservation and weighted-smoothness constraints. Our experiments show that our method outperforms the baseline methods on night images.