Multimodal Metadata Assignment for Cultural Heritage Artifacts

Luis Rei; Dunja Mladenić; Mareike Dorozynski; Franz Rottensteiner; Thomas Schleider; Raphaël Troncy; Jorge Sebastián Lozano; Mar Gaitán Salvatella

We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta). Both are trained as multitask classifiers and use the focal loss to handle class imbalance. Tabular data and late fusion are handled by Gradient Tree Boosting. We also show how we leveraged specific data models and taxonomy in a Knowledge Graph to create the dataset and to store classification results. All individual classifiers accurately predict missing properties in the digitized silk artifacts, with the multimodal approach providing the best results.

updated: Sat Jun 01 2024 12:41:03 GMT+0000 (UTC)

published: Sat Jun 01 2024 12:41:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト