An Investigation of Critical Issues in Bias Mitigation Techniques

Robik Shrestha; Kushal Kafle; Christopher Kanan

バイアス緩和技術における重大な問題の調査

ディープラーニングの重大な問題は、システムが不適切なバイアスを学習し、その結果、マイノリティグループでうまく機能できないことです。これにより、バイアスを軽減するために努力する複数のアルゴリズムが作成されました。ただし、これらの方法がどれほど効果的かは明らかではありません。これは、研究プロトコルが論文間で異なり、システムが多くの形式のバイアスをテストできないデータセットでテストされ、システムが隠された知識にアクセスできるか、テストセットに合わせて特別に調整されているためです。これに対処するために、改善された評価プロトコル、実用的なメトリック、および新しいデータセットを導入します。これにより、バイアス軽減アルゴリズムに関する重要な質問をしたり、回答したりできます。 3つのベンチマークデータセットにわたって同じネットワークアーキテクチャとハイパーパラメータ選択ポリシーを使用して、7つの最先端のアルゴリズムを評価します。複数のバイアスソースに対するロバスト性の評価を可能にするバイアスMNISTと呼ばれる新しいデータセットを紹介します。バイアスされたMNISTと視覚的な質問応答（VQA）ベンチマークを使用して、隠れたバイアスに対する堅牢性を評価します。テストセットの分布に合わせて調整するだけでなく、さまざまな調整の分布にまたがる堅牢性を調査します。これは、多くのアプリケーションでテスト分布が開発中にわからない場合があるため重要です。アルゴリズムは隠れたバイアスを利用し、複数の形式のバイアスにスケーリングできず、チューニングセットの選択に非常に敏感であることがわかりました。私たちの調査結果に基づいて、私たちはコミュニティに将来のバイアス緩和方法のより厳密な評価を採用するように求めます。すべてのデータ、コード、および結果は、https：//github.com/erobic/bias-mitigatorsで公開されています。

A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not clear how effective these methods are. This is because study protocols differ among papers, systems are tested on datasets that fail to test many forms of bias, and systems have access to hidden knowledge or are tuned specifically to the test set. To address this, we introduce an improved evaluation protocol, sensible metrics, and a new dataset, which enables us to ask and answer critical questions about bias mitigation algorithms. We evaluate seven state-of-the-art algorithms using the same network architecture and hyperparameter selection policy across three benchmark datasets. We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources. We use Biased MNIST and a visual question answering (VQA) benchmark to assess robustness to hidden biases. Rather than only tuning to the test set distribution, we study robustness across different tuning distributions, which is critical because for many applications the test distribution may not be known during development. We find that algorithms exploit hidden biases, are unable to scale to multiple forms of bias, and are highly sensitive to the choice of tuning set. Based on our findings, we implore the community to adopt more rigorous assessment of future bias mitigation methods. All data, code, and results are publicly available at: https://github.com/erobic/bias-mitigators.

updated: Fri Oct 22 2021 19:56:52 GMT+0000 (UTC)

published: Thu Apr 01 2021 00:14:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト