Learning the Legibility of Visual Text Perturbations

Dev Seth; Rickard Stureborg; Danish Pruthi; Bhuwan Dhingra

ビジュアルテキストの摂動の読みやすさを学ぶ

NLP における多くの敵対的攻撃は、視覚的に類似した文字列 ('ergo' → 'ϵrgo') を生成するために入力を混乱させますが、人間には判読可能ですが、モデルのパフォーマンスを低下させます。読みやすさを維持することはテキストの摂動の必要条件ですが、それを体系的に特徴付ける作業はほとんど行われていません。代わりに、可読性は通常、摂動の性質と範囲に関する直感によって大まかに強制されます。特に、読みやすさを維持しながら入力をどの程度摂動できるか、または摂動された文字列の読みやすさをどのように定量化するかは不明です。この作業では、乱れた文字列の読みやすさを予測するモデルを学習し、読みやすさに基づいて候補の摂動をランク付けすることで、このギャップに対処します。そのために、私たちは LEGIT を収集して公開しています。LEGIT は、視覚的に乱されたテキストの読みやすさを含む、人間が注釈を付けたデータセットです。このデータセットを使用して、入力が読みやすいかどうかを予測する際に最大 0.91 の F1 スコアを達成し、与えられた 2 つの摂動のどちらがより読みやすいかを予測する際に 0.86 の精度を達成する、テキストベースとビジョンベースの両方のモデルを構築します。さらに、LEGIT データセットからの判読可能な摂動は、最もよく知られている攻撃戦略よりも NLP モデルのパフォーマンスを低下させるのに効果的であることを発見しました。これは、現在のモデルが、既存の視覚的攻撃によってキャプチャされるものを超えた幅広い摂動に対して脆弱である可能性があることを示唆しています。データ、コード、およびモデルは、https://github.com/dvsth/learning-legibility-2023 で入手できます。

Many adversarial attacks in NLP perturb inputs to produce visually similar strings ('ergo' → 'ϵrgo') which are legible to humans but degrade model performance. Although preserving legibility is a necessary condition for text perturbation, little work has been done to systematically characterize it; instead, legibility is typically loosely enforced via intuitions around the nature and extent of perturbations. Particularly, it is unclear to what extent can inputs be perturbed while preserving legibility, or how to quantify the legibility of a perturbed string. In this work, we address this gap by learning models that predict the legibility of a perturbed string, and rank candidate perturbations based on their legibility. To do so, we collect and release LEGIT, a human-annotated dataset comprising the legibility of visually perturbed text. Using this dataset, we build both text- and vision-based models which achieve up to 0.91 F1 score in predicting whether an input is legible, and an accuracy of 0.86 in predicting which of two given perturbations is more legible. Additionally, we discover that legible perturbations from the LEGIT dataset are more effective at lowering the performance of NLP models than best-known attack strategies, suggesting that current models may be vulnerable to a broad range of perturbations beyond what is captured by existing visual attacks. Data, code, and models are available at https://github.com/dvsth/learning-legibility-2023.

updated: Fri Mar 10 2023 19:54:39 GMT+0000 (UTC)

published: Thu Mar 09 2023 07:22:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト