Hateful Memes Detection via Complementary Visual and Linguistic Networks

Weibo Zhang; Guihua Liu; Zhuohua Li; Fuqing Zhu

補完的な視覚および言語ネットワークを介した嫌なミームの検出

嫌なミームはソーシャルメディアに広まっており、否定的な情報を伝えます。嫌なミーム検出の主な課題は、表現の意味が単一のモダリティでは十分に認識できないことです。モーダル情報をさらに統合するために、Hateful Memes Challenge 2020で補完的な視覚的および言語的ネットワークに基づく候補解を調査します。このようにして、マルチモダリティのより包括的な情報を詳細に調査できます。複雑なマルチモーダルシナリオを定式化するために、コンテキストレベルと機密オブジェクトレベルの両方の情報が視覚的および言語的埋め込みで考慮されます。具体的には、事前にトレーニングされた分類器とオブジェクト検出器を使用して、入力からコンテキストの特徴と関心領域（RoI）を取得し、続いて視覚的な埋め込みのための位置表現の融合を行います。言語の埋め込みは、3つのコンポーネント、つまり、文の単語の埋め込み、位置の埋め込み、および対応するSpacyの埋め込み（Sembedding）で構成されます。これは、Spacyによって抽出された語彙によって表される記号です。視覚的および言語的埋め込みの両方が、設計された補完的視覚および言語（CVL）ネットワークに供給され、嫌なミームの予測を生成します。 Hateful Memes Challenge Datasetの実験結果は、CVLが適切なパフォーマンスを提供し、AUROCと精度の基準で78：48％と72：95％を生成することを示しています。コードはhttps://github.com/webYFDT/hatefulで入手できます。

Hateful memes are widespread in social media and convey negative information. The main challenge of hateful memes detection is that the expressive meaning can not be well recognized by a single modality. In order to further integrate modal information, we investigate a candidate solution based on complementary visual and linguistic network in Hateful Memes Challenge 2020. In this way, more comprehensive information of the multi-modality could be explored in detail. Both contextual-level and sensitive object-level information are considered in visual and linguistic embedding to formulate the complex multi-modal scenarios. Specifically, a pre-trained classifier and object detector are utilized to obtain the contextual features and region-of-interests (RoIs) from the input, followed by the position representation fusion for visual embedding. While linguistic embedding is composed of three components, i.e., the sentence words embedding, position embedding and the corresponding Spacy embedding (Sembedding), which is a symbol represented by vocabulary extracted by Spacy. Both visual and linguistic embedding are fed into the designed Complementary Visual and Linguistic (CVL) networks to produce the prediction for hateful memes. Experimental results on Hateful Memes Challenge Dataset demonstrate that CVL provides a decent performance, and produces 78:48% and 72:95% on the criteria of AUROC and Accuracy. Code is available at https://github.com/webYFDT/hateful.

updated: Wed Dec 09 2020 11:11:09 GMT+0000 (UTC)

published: Wed Dec 09 2020 11:11:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト