This paper proposes a novel approach to stereo visual odometry without stereo matching. It is particularly robust in scenes of repetitive high-frequency textures. Referred to as DSVO (Direct Stereo Visual Odometry), it operates directly on pixel intensities, without any explicit feature matching, and is thus efficient and more accurate than the state-of-the-art stereo-matching-based methods. It applies a semi-direct monocular visual odometry running on one camera of the stereo pair, tracking the camera pose and mapping the environment simultaneously; the other camera is used to optimize the scale of monocular visual odometry. We evaluate DSVO in a number of challenging scenes to evaluate its performance and present comparisons with the state-of-the-art stereo visual odometry algorithms.