Pretrained Transformers Do not Always Improve Robustness

Swaroop Mishra; Bhavdeep Singh Sachdeva; Chitta Baral

事前学習済みのトランスフォーマーが常にロバスト性を向上させるとは限らない

事前学習済みトランスフォーマー (PT) は、Bag of Words (BOW)、LSTM、Word2Vec を利用した畳み込みニューラルネットワーク (CNN)、Glove 埋め込みなどの従来のモデルよりも、Out of Distribution (OOD) の堅牢性を向上させることが示されています。データセットの一部にノイズが含まれる可能性がある現実世界の設定で、ロバスト性の比較はどのように成り立ちますか? PT は、ノイズの多いデータへの曝露に関して、従来のモデルよりも堅牢な表現も提供しますか? 10 のモデルについて比較研究を行い、ノイズの多いデータへの曝露に関して、PT が従来のモデルよりもロバストな表現を提供しないという経験的証拠を見つけました。さらに調査し、OOD の一般化を改善することが示されている敵対的フィルタリング (AF) メカニズムで PT を強化します。ただし、一般化の増加は必ずしもロバスト性を向上させるわけではありません.

Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does the robustness comparison hold in a real world setting where some part of the dataset can be noisy? Do PT also provide more robust representation than traditional models on exposure to noisy data? We perform a comparative study on 10 models and find an empirical evidence that PT provide less robust representation than traditional models on exposure to noisy data. We investigate further and augment PT with an adversarial filtering (AF) mechanism that has been shown to improve OOD generalization. However, increase in generalization does not necessarily increase robustness, as we find that noisy data fools the AF method powered by PT.

updated: Fri Oct 14 2022 09:30:36 GMT+0000 (UTC)

published: Fri Oct 14 2022 09:30:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト