An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Wenxuan Wang; Jingyuan Huang; Jen-tse Huang; Chang Chen; Jiazhen Gu; Pinjia He; Michael R. Lyu

1 つの画像は 1,000 の有毒な言葉に匹敵します: コンテンツモデレーションソフトウェアのための変貌したテストフレームワーク

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

ソーシャルメディアプラットフォームの急激な成長は、人間社会におけるコミュニケーションとコンテンツの普及に革命をもたらしました。それにもかかわらず、これらのプラットフォームは、ヘイトスピーチ、悪意のある広告、ポルノなどの有害なコンテンツを拡散するために悪用されることが増えており、青少年の精神的健康への危害などの深刻な悪影響をもたらしています。テキストおよび画像コンテンツのモデレーション方法の開発と展開には多大な努力が払われていますが、悪意のあるユーザーは、通常は多少の干渉を伴いながら、テキストを画像に埋め込むことでモデレートを回避できます (テキストのスクリーンショットなど)。このような悪意のある入力に対する最新のコンテンツモデレーションソフトウェアのパフォーマンスはまだ十分に解明されていないことがわかりました。この研究では、コンテンツモデレーションソフトウェア用のメタモーフィックテストフレームワークである OASIS を提案します。 OASIS は、Twitter、Instagram、Sina Weibo、Baidu Tieba を含む 4 つの人気のあるソーシャルメディアアプリケーションから収集された 5,000 件の現実世界の有害なコンテンツに関するパイロット調査から要約された 21 の変換ルールを採用しています。有害なテキストコンテンツを考慮すると、OASIS は有害性を維持しながらモデレーションを回避する可能性が高い画像テストケースを生成できます。評価では、OASIS を使用して、有名企業の 5 つの商用テキストコンテンツモデレーションソフトウェア (つまり、Google Cloud、Microsoft Azure、Baidu Cloud、Alibaba Cloud、Tencent Cloud) と最先端のモデレーション研究をテストしました。モデル。結果は、OASIS が最大 100% のエラー発見率を達成していることを示しています。さらに、OASIS によって生成されたテストケースを使用してモデルを再トレーニングすることにより、パフォーマンスを低下させることなくモデレーションモデルの堅牢性を向上させることができます。

The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000 real-world toxic contents collected from 4 popular social media applications, including Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents, OASIS can generate image test cases, which preserve the toxicity yet are likely to bypass moderation. In the evaluation, we employ OASIS to test five commercial textual content moderation software from famous companies (i.e., Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud), as well as a state-of-the-art moderation research model. The results show that OASIS achieves up to 100% error finding rates. Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation.

updated: Fri Aug 18 2023 20:33:06 GMT+0000 (UTC)

published: Fri Aug 18 2023 20:33:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト