Ophthalmic Biomarker Detection Using Ensembled Vision Transformers and Knowledge Distillation

H. A. Z. Sameen Shahgir; Khondker Salman Sayeed; Tanjeem Azwad Zaman; Md. Asif Haider; Sheikh Saifur Rahman Jony; M. Sohel Rahman

In this paper, we outline our approach to identify ophthalmic biomarkers from Optical Coherence Tomography (OCT) images presented in the OLIVES dataset, obtained from a diverse range of patients. Using robust augmentations and 5-fold cross-validation, we trained two vision transformer-based models: MaxViT and EVA-02, and ensembled them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for local feature detection while EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features. Our solution brought us the champion title of the IEEE SPS Video and Image Processing (VIP) Cup 2023, achieving a patient-wise F1 score of 0.814 in the first phase and 0.8527 in the second and final phase of the competition, scoring 3.8% higher than the next best solution.

updated: Sat Nov 23 2024 17:28:04 GMT+0000 (UTC)

published: Sat Oct 21 2023 13:27:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト