Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset

Jingxi Weng; Benjamin Wildman-Tobriner; Mateusz Buda; Jichen Yang; Lisa M. Ho; Brian C. Allen; Wendy L. Ehieli; Chad M. Miller; Jikai Zhang; Maciej A. Mazurowski

超音波で甲状腺結節を分類するための深層学習: 独立したデータセットでの検証

目的: 目的は、以前に検証されたディープラーニングアルゴリズムを新しい甲状腺結節の超音波画像データセットに適用し、その性能を放射線科医と比較することです。方法: 以前の研究では、甲状腺結節を検出し、2 つの超音波画像を使用して悪性分類を行うことができるアルゴリズムが提示されました。マルチタスクのディープ畳み込みニューラルネットワークは、1278 個の結節からトレーニングされ、最初は 99 個の個別の結節でテストされました。結果は、放射線科医の結果と同等でした。このアルゴリズムは、トレーニングケースとは異なるメーカーおよび製品タイプの超音波装置で画像化された 378 個の結節でさらにテストされました。深層学習と比較するために、4 人の経験豊富な放射線科医が結節を評価するよう依頼されました。結果: 深層学習アルゴリズムと 4 人の放射線科医の曲線下面積 (AUC) が、パラメトリックな従正規推定法を使用して計算されました。深層学習アルゴリズムの場合、AUC は 0.69 (95% CI: 0.64 - 0.75) でした。放射線科医の AUC は、0.63 (95% CI: 0.59 - 0.67)、0.66 (95% CI: 0.61 - 0.71)、0.65 (95% CI: 0.60 - 0.70)、および 0.63 (95% CI: 0.58 - 0.67) でした。結論: 新しいテストデータセットでは、ディープラーニングアルゴリズムは 4 人の放射線科医全員で同様のパフォーマンスを達成しました。アルゴリズムと放射線科医の間の相対的なパフォーマンスの違いは、超音波スキャナーの違いの影響を大きく受けません。

Objectives: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. Methods: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. Results: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.69 (95% CI: 0.64 - 0.75). The AUC of radiologists were 0.63 (95% CI: 0.59 - 0.67), 0.66 (95% CI:0.61 - 0.71), 0.65 (95% CI: 0.60 - 0.70), and 0.63 (95%CI: 0.58 - 0.67). Conclusion: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists. The relative performance difference between the algorithm and the radiologists is not significantly affected by the difference of ultrasound scanner.

updated: Thu May 04 2023 21:27:27 GMT+0000 (UTC)

published: Wed Jul 27 2022 19:45:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト