The planning of digital orthodontic treatment requires providing tooth alignment, which not only consumes a lot of time and labor to determine manually but also relays clinical experiences heavily. In this work, we proposed a lightweight tooth alignment neural network based on Swin-transformer. We first re-organized 3D point clouds based on virtual arch lines and converted them into order-sorted multi-channel textures, which improves the accuracy and efficiency simultaneously. We then designed two new occlusal loss functions that quantitatively evaluate the occlusal relationship between the upper and lower jaws. They are important clinical constraints, first introduced to the best of our knowledge, and lead to cutting-edge prediction accuracy. To train our network, we collected a large digital orthodontic dataset that has 591 clinical cases, including various complex clinical cases. This dataset will benefit the community after its release since there is no open dataset so far. Furthermore, we also proposed two new orthodontic dataset augmentation methods considering tooth spatial distribution and occlusion. We evaluated our method with this dataset and extensive experiments, including comparisons with STAT methods and ablation studies, and demonstrate the high prediction accuracy of our method.