Algorithmic encoding of protected characteristics and its implications on performance disparities

Ben Glocker; Charles Jones; Melanie Bernhardt; Stefan Winzeck

保護された特性のアルゴリズムによるエンコーディングとそのパフォーマンスの不一致への影響

臨床的意思決定にAIを使用すると、健康格差が拡大する可能性があることが正しく強調されています。機械学習モデルは、たとえば、患者の人種的アイデンティティと臨床転帰の間の望ましくない相関関係を検出する場合があります。このような相関関係は、モデル開発に使用される（履歴）データによく見られます。病気の検出モデルのバイアスを報告する研究が増えています。十分なサービスを受けていない集団からのデータが不足していることに加えて、これらのバイアスがどのようにエンコードされ、異種のパフォーマンスをどのように削減または削除するかについてはほとんどわかっていません。アルゴリズムが生物学的性別や人種的アイデンティティなどの患者の特徴を認識し、予測を行う際にこの情報を直接的または間接的に使用する可能性があるという懸念があります。しかし、そのような情報が実際に使用されているかどうかをどのように確認できるかは不明です。この記事は、画像ベースの病気の検出のための機械学習モデルの内部動作の直感的な検査を可能にする方法論を探求することによって、これらの問題にいくつかの光を当てることを目的としています。また、パフォーマンスの不一致に対処する方法を調査し、自動しきい値選択が効果的であるが疑わしい手法であることがわかり、サブグループ全体で同等の真陽性率と偽陽性率のモデルが得られます。私たちの調査結果は、パフォーマンスの不一致の根本的な原因をよりよく理解するためのさらなる調査を必要としています。

It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. A machine learning model may pick up undesirable correlations, for example, between a patient's racial identity and clinical outcome. Such correlations are often present in (historical) data used for model development. There has been an increase in studies reporting biases in disease detection models. Besides the scarcity of data from underserved populations, very little is known about how these biases are encoded and how one may reduce or even remove disparate performance. There are concerns that an algorithm may recognize patient characteristics such as biological sex or racial identity, and then directly or indirectly use this information when making predictions. But it remains unclear how we can establish whether such information is actually used. This article aims to shed some light on these issues by exploring methodology allowing intuitive inspections of the inner working of machine learning models for image-based detection of disease. We also investigate how to address performance disparities and find automatic threshold selection to be an effective yet questionable technique, resulting in models with comparable true and false positive rates across subgroups. Our findings call for further research to better understand the underlying causes of performance disparities.

updated: Wed Dec 22 2021 08:56:05 GMT+0000 (UTC)

published: Wed Oct 27 2021 20:30:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト