In this paper, we outline our approach to identify ophthalmic biomarkers from Optical Coherence Tomography (OCT) images presented in the OLIVES dataset, obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensembled them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for local feature detection while EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features. Our solution brought us the champion title of the IEEE SPS Video and Image Processing (VIP) Cup 2023, achieving a patient-wise F1 score of 0.814 in the first phase and 0.8527 in the second and final phase of the competition, scoring 3.8% higher than the next best solution.