Recent years have witnessed the remarkable progress of applying deep learning models in video person re-identification (Re-ID). A key factor for video person Re-ID is to effectively construct discriminative and robust video feature representations for many complicated situations. Part-based approaches employ spatial and temporal attention to extract representative local features. While correlations between parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between relevant regional features. Specifically, we exploit the pose alignment connection and the feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine regional features iteratively, and the neighbor nodes' information is taken into account for part feature representation. To learn compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method. The code is available at https://github.com/weleen/AGRL.pytorch.