Evaluating the Fairness of Deep Learning Uncertainty Estimates in Medical Image Analysis

Raghav Mehta; Changjian Shui; Tal Arbel

医用画像解析における深層学習の不確実性推定の公平性の評価

ディープラーニング (DL) モデルは多くの医用画像分析タスクで大きな成功を収めていますが、結果として得られるモデルを実際の臨床コンテキストに展開するには、(1) さまざまな部分母集団にわたってロバスト性と公平性を示すこと、および (2) DL モデル予測の信頼性は、不確実性の形で正確に表現されます。残念ながら、最近の研究では、医療画像分析のコンテキストで、人口統計学的サブグループ (人種、性別、年齢など) 全体で DL モデルに大きな偏りがあり、モデルに公平性がないことが示されています。 DL モデルの公平性の欠如を軽減するために ML の文献でいくつかの方法が提案されていますが、それらは不確実性の推定への影響を考慮せずに、グループ間の絶対的なパフォーマンスに完全に焦点を当てています。この作業では、一般的な公平性モデルが医用画像分析のサブグループ間の偏りを最終的なパフォーマンスの観点から克服する効果と、不確実性の定量化に対するそれらの効果の最初の調査を提示します。 (i) 皮膚病変分類、(ii) 脳腫瘍セグメンテーション、および (iii) アルツハイマー病臨床スコア回帰の 3 つの異なる臨床的に関連するタスクで広範な実験を行います。私たちの結果は、データバランシングや分散的にロバストな最適化などの一般的な ML メソッドが、一部のタスクのモデルパフォーマンスに関して公平性の問題を軽減することに成功したことを示しています。ただし、これには、モデル予測に関連する不確実性の推定が不十分であるという代償が伴います。医用画像解析に公平性モデルを採用する場合、このトレードオフを軽減する必要があります。

Although deep learning (DL) models have shown great success in many medical image analysis tasks, deployment of the resulting models into real clinical contexts requires: (1) that they exhibit robustness and fairness across different sub-populations, and (2) that the confidence in DL model predictions be accurately expressed in the form of uncertainties. Unfortunately, recent studies have indeed shown significant biases in DL models across demographic subgroups (e.g., race, sex, age) in the context of medical image analysis, indicating a lack of fairness in the models. Although several methods have been proposed in the ML literature to mitigate a lack of fairness in DL models, they focus entirely on the absolute performance between groups without considering their effect on uncertainty estimation. In this work, we present the first exploration of the effect of popular fairness models on overcoming biases across subgroups in medical image analysis in terms of bottom-line performance, and their effects on uncertainty quantification. We perform extensive experiments on three different clinically relevant tasks: (i) skin lesion classification, (ii) brain tumour segmentation, and (iii) Alzheimer's disease clinical score regression. Our results indicate that popular ML methods, such as data-balancing and distributionally robust optimization, succeed in mitigating fairness issues in terms of the model performances for some of the tasks. However, this can come at the cost of poor uncertainty estimates associated with the model predictions. This tradeoff must be mitigated if fairness models are to be adopted in medical image analysis.

updated: Mon Mar 06 2023 16:01:30 GMT+0000 (UTC)

published: Mon Mar 06 2023 16:01:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト