Multispectral (MS) snapshot cameras equipped with a MS filter array (MSFA), capture multiple spectral bands in a single shot, resulting in a raw mosaic image where each pixel holds only one channel value. The fully-defined MS image is estimated from the raw one through demosaicing, which inevitably introduces spatio-spectral artifacts. Moreover, training on fully-defined MS images can be computationally intensive, particularly with deep neural networks (DNNs), and may result in features lacking discrimination power due to suboptimal learning of spatio-spectral interactions. Furthermore, outdoor MS image acquisition occurs under varying lighting conditions, leading to illumination-dependent features. This paper presents an original approach to learn discriminant and illumination-robust features directly from raw images. It involves: raw spectral constancy to mitigate the impact of illumination, MSFA-preserving transformations suited for raw image augmentation to train DNNs on diverse raw textures, and raw-mixing to capture discriminant spatio-spectral interactions in raw images. Experiments on MS image classification show that our approach outperforms both handcrafted and recent deep learning-based methods, while also requiring significantly less computational effort. The source code is available at https://github.com/AnisAmziane/RawTexture.