To improve the performance of classical generative adversarial network (GAN), Wasserstein generative adversarial networks (W-GAN) was developed as a Kantorovich dual formulation of the optimal transport (OT) problem using Wasserstein-1 distance. However, it was not clear how cycleGAN-type generative models can be derived from the optimal transport theory. Here we show that a novel cycleGAN architecture can be derived as a Kantorovich dual OT formulation if a penalized least square (PLS) cost with deep learning-based inverse path penalty is used as a transportation cost. One of the most important advantages of this formulation is that depending on the knowledge of the forward problem, distinct variations of cycleGAN architecture can be derived: for example, one with two pairs of generators and discriminators, and the other with only a single pair of generator and discriminator. Even for the two generator cases, we show that the structural knowledge of the forward operator can lead to a simpler generator architecture which significantly simplifies the neural network training. The new cycleGAN formulation, what we call the OT-cycleGAN, have been applied for various biomedical imaging problems, such as accelerated magnetic resonance imaging (MRI), super-resolution microscopy, and low-dose x-ray computed tomography (CT). Experimental results confirm the efficacy and flexibility of the theory.