This paper compares the performance of a NN taking the output of a DCT (Discrete Cosine Transform) of an image patch with leNet for classifying MNIST hand written digits. The basis functions underlying the DCT bear a passing resemblance to some of the learned basis function of the Visual Transformer but are an order of magnitude faster to apply.
updated: Fri Nov 04 2022 11:56:00 GMT+0000 (UTC)
published: Fri Nov 04 2022 11:56:00 GMT+0000 (UTC)