Spatiotemporal prediction aims to generate future sequences by paradigms learned from historical contexts. It is essential in numerous domains, such as traffic flow prediction and weather forecasting. Recently, research in this field has been predominantly driven by deep neural networks based on autoencoder architectures. However, existing methods commonly adopt autoencoder architectures with identical receptive field sizes. To address this issue, we propose an Asymmetric Receptive Field Autoencoder (ARFA) model, which introduces corresponding sizes of receptive field modules tailored to the distinct functionalities of the encoder and decoder. In the encoder, we present a large kernel module for global spatiotemporal feature extraction. In the decoder, we develop a small kernel module for local spatiotemporal information reconstruction. Experimental results demonstrate that ARFA consistently achieves state-of-the-art performance on popular datasets. Additionally, we construct the RainBench, a large-scale radar echo dataset for precipitation prediction, to address the scarcity of meteorological data in the domain.