Object re-identification (ReID) in large camera networks has many challenges. First, the similar appearances of objects degrade ReID performances. This challenge cannot be addressed by existing appearance-based ReID methods. Second, most ReID studies are performed in laboratory settings and do not consider ReID problems in real-world scenarios. To overcome these challenges, we introduce a novel ReID framework that leverages a spatial-temporal fusion network and causal identity matching (CIM). The framework estimates camera network topology using the proposed adaptive Parzen window and combines appearance features with spatial-temporal cue within the Fusion Network. It achieved outstanding performance across several datasets, including VeRi776, Vehicle-3I, and Market-1501, achieving up to 99.70% rank-1 accuracy and 95.5% mAP. Furthermore, the proposed CIM approach, which dynamically assigns gallery sets based on the camera network topology, further improved ReID accuracy and robustness in real-world settings, evidenced by a 94.95% mAP and 95.19% F1 score on the Vehicle-3I dataset. The experimental results support the effectiveness of incorporating spatial-temporal information and CIM for real-world ReID scenarios regardless of the data domain (e.g., vehicle, person).