Simple and Efficient Confidence Score for Grading Whole Slide Images

Mélanie Lubrano; Yaëlle Bellahsen-Harrar; Rutger Fick; Cécile Badoual; Thomas Walter

スライド全体の画像をグレーディングするためのシンプルで効率的な信頼スコア

スライド画像全体で前がん病変をグレーディングすることは困難な作業です。形態学的表現型の連続的なスペースにより、異なるグレード間の明確な決定が困難になることが多く、評価者間および評価者内の合意が低くなります。病理学者が診断を実行して標準化するのに役立つ人工知能 (AI) アルゴリズムがますます開発されています。ただし、これらのモデルは、クラスのあいまいさを考慮せずに予測をレンダリングでき、予告なしに失敗する可能性があり、臨床コンテキストでの幅広い受け入れが妨げられます。このホワイトペーパーでは、グレーディングタスクにおける AI モデルの信頼度を測定する新しいスコアを提案します。当社の信頼度スコアは、序数の出力変数に特に適合しており、用途が広く、追加のトレーニングや追加の推論、特定のアーキテクチャの変更は必要ありません。モンテカルロドロップアウトやディープアンサンブルなどの他の一般的な手法と比較すると、私たちの方法が最先端の結果を提供すると同時に、よりシンプルで用途が広く、計算量が少ないことがわかります。スコアはまた、容易に解釈可能であり、病理学者の実際の躊躇と一致しています。スコアが誤った予測のスライドを正確に識別できること、および信頼性の高い決定の精度が信頼性の低い決定の精度よりも大幅に高いことを示します (テストセットでの AUC のギャップは 17.1%)。提案された信頼スコアは、病理学者がワークフローで直接活用でき、前がん病変のグレーディングなどの困難なタスクを支援できると考えています。

Grading precancerous lesions on whole slide images is a challenging task: the continuous space of morphological phenotypes makes clear-cut decisions between different grades often difficult, leading to low inter- and intra-rater agreements. More and more Artificial Intelligence (AI) algorithms are developed to help pathologists perform and standardize their diagnosis. However, those models can render their prediction without consideration of the ambiguity of the classes and can fail without notice which prevent their wider acceptance in a clinical context. In this paper, we propose a new score to measure the confidence of AI models in grading tasks. Our confidence score is specifically adapted to ordinal output variables, is versatile and does not require extra training or additional inferences nor particular architecture changes. Comparison to other popular techniques such as Monte Carlo Dropout and deep ensembles shows that our method provides state-of-the art results, while being simpler, more versatile and less computationally intensive. The score is also easily interpretable and consistent with real life hesitations of pathologists. We show that the score is capable of accurately identifying mispredicted slides and that accuracy for high confidence decisions is significantly higher than for low-confidence decisions (gap in AUC of 17.1% on the test set). We believe that the proposed confidence score could be leveraged by pathologists directly in their workflow and assist them on difficult tasks such as grading precancerous lesions.

updated: Wed Mar 08 2023 14:15:43 GMT+0000 (UTC)

published: Wed Mar 08 2023 14:15:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト