Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging

Nishanth Arun; Nathan Gaw; Praveer Singh; Ken Chang; Mehak Aggarwal; Bryan Chen; Katharina Hoebel; Sharut Gupta; Jay Patel; Mishka Gidwani; Julius Adebayo; Matthew D. Li; Jayashree Kalpathy-Cramer

医用画像の異常を特定するための顕著性マップの（非）信頼性の評価

顕著性マップは、入力医療画像の最も適切な領域の識別を通じて分類子の事後説明を提供することにより、深層学習モデルをより解釈可能にするために広く使用される方法になりました。それらはニューラルネットワークが行う決定について臨床的にもっともらしい説明を提供するために医療画像でますます使用されています。ただし、これらの可視化マップの有用性と堅牢性は、医療用画像のコンテキストではまだ厳密に検討されていません。このコンテキストでの信頼性には、1）ローカリゼーションユーティリティ、2）モデルの重みのランダム化に対する感度、3）再現性、4）再現性が必要であると私たちは考えます。 2つの大規模な公共放射線データセットで利用可能なローカリゼーション情報を使用して、精度リコール曲線（AUPRC）と構造類似性インデックス（SSIM）の下の面積を使用して、上記の基準に対して一般的に使用される8つの顕著性マップアプローチのパフォーマンスを定量化し、さまざまなベースライン指標。フレームワークを使用して顕著性マップの信頼性を定量化すると、8つの顕著性マップ手法すべてが基準の少なくとも1つに失敗し、ほとんどの場合、ベースラインと比較すると信頼性が低いことがわかります。医療画像処理のリスクの高いドメインでの使用は、さらに精査する必要があり、ローカリゼーションがネットワークの望ましい出力である場合は、検出またはセグメンテーションモデルを使用することをお勧めします。さらに、調査結果の再現性を高めるために、この作業で実行されたすべてのテストに使用したコードを次のリンクで提供しています：https://github.com/QTIM-Lab/Assessing-Saliency-Maps。

Saliency maps have become a widely used method to make deep learning models more interpretable by providing post-hoc explanations of classifiers through identification of the most pertinent areas of the input medical image. They are increasingly being used in medical imaging to provide clinically plausible explanations for the decisions the neural network makes. However, the utility and robustness of these visualization maps has not yet been rigorously examined in the context of medical imaging. We posit that trustworthiness in this context requires 1) localization utility, 2) sensitivity to model weight randomization, 3) repeatability, and 4) reproducibility. Using the localization information available in two large public radiology datasets, we quantify the performance of eight commonly used saliency map approaches for the above criteria using area under the precision-recall curves (AUPRC) and structural similarity index (SSIM), comparing their performance to various baseline measures. Using our framework to quantify the trustworthiness of saliency maps, we show that all eight saliency map techniques fail at least one of the criteria and are, in most cases, less trustworthy when compared to the baselines. We suggest that their usage in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network. Additionally, to promote reproducibility of our findings, we provide the code we used for all tests performed in this work at this link: https://github.com/QTIM-Lab/Assessing-Saliency-Maps.

updated: Thu Jul 15 2021 02:39:22 GMT+0000 (UTC)

published: Thu Aug 06 2020 17:11:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト