LLVMs4Protest: Harnessing the Power of Large Language and Vision Models for Deciphering Protests in the News

Yongjun Zhang

Large language and vision models have transformed how social movements scholars identify protest and extract key protest attributes from multi-modal data such as texts, images, and videos. This article documents how we fine-tuned two large pretrained transformer models, including longformer and swin-transformer v2, to infer potential protests in news articles using textual and imagery data. First, the longformer model was fine-tuned using the Dynamic of Collective Action (DoCA) Corpus. We matched the New York Times articles with the DoCA database to obtain a training dataset for downstream tasks. Second, the swin-transformer v2 models was trained on UCLA-protest imagery data. UCLA-protest project contains labeled imagery data with information such as protest, violence, and sign. Both fine-tuned models will be available via https://github.com/Joshzyj/llvms4protest. We release this short technical report for social movement scholars who are interested in using LLVMs to infer protests in textual and imagery data.

updated: Thu Nov 30 2023 04:17:30 GMT+0000 (UTC)

published: Thu Nov 30 2023 04:17:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト