Kinship, a soft biometric detectable in media, is fundamental for a myriad of use-cases. Despite the difficulty of detecting kinship, annual data challenges using still-images have consistently improved performances and attracted new researchers. Now, systems reach performance levels unforeseeable a decade ago, closing in on performances acceptable to deploy in practice. Similar to other biometric tasks, we expect systems can benefit from additional modalities. We hypothesize that adding modalities to FIW, which contains only still-images, will improve performance. Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i.e., video, audio, and text captions). Specifically, we introduce the first publicly available multi-task MM kinship dataset. To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data, requiring minimal human input and no financial cost. The proposed MM corpus allows the problem statements to be more realistic template-based protocols. We show significant improvements in all benchmarks with the added modalities. The results highlight edge cases to inspire future research with different areas of improvement. FIW MM provides the data required to increase the potential of automated systems to detect kinship in MM. It also allows experts from diverse fields to collaborate in novel ways.