Training a real-time gesture recognition model heavily relies on annotated data. However, manual data annotation is costly and demands substantial human effort. In order to address this challenge, we propose a novel annotation model that can automatically annotate gesture classes and identify their temporal ranges. Our ablation study demonstrates that our annotation model design surpasses the baseline in terms of both gesture classification accuracy (3-4% improvement) and localization accuracy (71-75% improvement). We believe that this annotation model has immense potential to improve the training of downstream gesture recognition models using unlabeled datasets.