The machine vision systems have been playing a significant role in visual monitoring systems. With the help of stereovision and machine learning, it will be able to mimic human-like visual system and behaviour towards the environment. In this paper, we present a stereo vision based 3-DOF robot which will be used to monitor places from remote using cloud server and internet devices. The 3-DOF robot will transmit human-like head movements, i.e., yaw, pitch, roll and produce 3D stereoscopic video and stream it in Real-time. This video stream is sent to the user through any generic internet devices with VR box support, i.e., smartphones giving the user a First-person real-time 3D experience and transfers the head motion of the user to the robot also in Real-time. The robot will also be able to track moving objects and faces as a target using deep neural networks which enables it to be a standalone monitoring robot. The user will be able to choose specific subjects to monitor in a space. The stereovision enables us to track the depth information of different objects detected and will be used to track human interest objects with its distances and sent to the cloud. A full working prototype is developed which showcases the capabilities of a monitoring system based on stereo vision, robotics, and machine learning.