Multimodal Short Video Rumor Detection System Based on Contrastive Learning

Yuxing Yang; Junhao Zhao; Siyi Wang; Xiangyu Min; Pengchao Wang; Haizhou Wang

対照学習に基づくマルチモーダル短編ビデオ噂検出システム

ニュース拡散の有力なチャネルとしてショートビデオプラットフォームが台頭するにつれ、中国の主要プラットフォームは徐々にフェイクニュースの蔓延の肥沃な土壌へと進化してきました。ただし、ビデオ間で大量の情報と共通の特徴があり、その結果均一性が生じるため、短いビデオの噂を区別することは大きな課題となります。短いビデオの噂の拡散に効果的に対処するために、私たちの研究グループは、各アルゴリズムの長所と短所を考慮して、マルチモーダルな特徴の融合と外部知識の統合を含む方法論を提案しています。提案された検出アプローチには、次のステップが必要です。(1) 短いビデオから抽出された複数の特徴を含む包括的なデータセットを作成します。 (2) マルチモーダル噂検出モデルの開発: まず、時間セグメントネットワーク (TSN) ビデオコーディングモデルを使用してビデオ特徴を抽出し、続いて光学式文字認識 (OCR) と自動音声認識 (ASR) を利用してビデオ特徴を抽出します。テキストの特徴。その後、BERT モデルがテキスト機能とビデオ機能を融合するために採用されます。 (3) 区別はコントラスト学習によって達成されます。関連するソースをクロールすることで外部の知識を取得し、ベクトルデータベースを活用してこの知識を分類出力に組み込みます。私たちの研究プロセスは実践的な考慮事項に基づいて推進されており、この研究から得られた知識は、短いビデオの噂の特定や社会的意見の管理などの実践的なシナリオで重要な価値を持つでしょう。

With the rise of short video platforms as prominent channels for news dissemination, major platforms in China have gradually evolved into fertile grounds for the proliferation of fake news. However, distinguishing short video rumors poses a significant challenge due to the substantial amount of information and shared features among videos, resulting in homogeneity. To address the dissemination of short video rumors effectively, our research group proposes a methodology encompassing multimodal feature fusion and the integration of external knowledge, considering the merits and drawbacks of each algorithm. The proposed detection approach entails the following steps: (1) creation of a comprehensive dataset comprising multiple features extracted from short videos; (2) development of a multimodal rumor detection model: first, we employ the Temporal Segment Networks (TSN) video coding model to extract video features, followed by the utilization of Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) to extract textual features. Subsequently, the BERT model is employed to fuse textual and video features; (3) distinction is achieved through contrast learning: we acquire external knowledge by crawling relevant sources and leverage a vector database to incorporate this knowledge into the classification output. Our research process is driven by practical considerations, and the knowledge derived from this study will hold significant value in practical scenarios, such as short video rumor identification and the management of social opinions.

updated: Wed May 17 2023 13:12:27 GMT+0000 (UTC)

published: Mon Apr 17 2023 16:07:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト