Enhancing roadway safety has become an essential computer vision focus area for Intelligent Transportation Systems (ITS). As a part of ITS, Vehicle Trajectory Prediction (VTP) aims to forecast a vehicle's future positions based on its past and current movements. VTP is a pivotal element for road safety, aiding in applications such as traffic management, accident prevention, work-zone safety, and energy optimization. While most works in this field focus on autonomous driving, with the growing number of surveillance cameras, another sub-field emerges for surveillance VTP with its own set of challenges. In this paper, we introduce VT-Former, a novel transformer-based VTP approach for highway safety and surveillance. In addition to utilizing transformers to capture long-range temporal patterns, a new Graph Attentive Tokenization (GAT) module has been proposed to capture intricate social interactions among vehicles. This study seeks to explore both the advantages and the limitations inherent in combining transformer architecture with graphs for VTP. Our investigation, conducted across three benchmark datasets from diverse surveillance viewpoints, showcases the State-of-the-Art (SotA) or comparable performance of VT-Former in predicting vehicle trajectories. This study underscores the potential of VT-Former and its architecture, opening new avenues for future research and exploration.