Deep Learning with robustness to missing data: A novel approach to the detection of COVID-19

Erdi Çallı; Keelin Murphy; Steef Kurstjens; Tijs Samson; Robert Herpers; Henk Smits; Matthieu Rutten; Bram van Ginneken

欠測データに対するロバスト性を備えた深層学習：COVID-19の検出への新しいアプローチ

現在の世界的大流行とRT-PCRテストの限界に照らして、新しい深層学習アーキテクチャであるDFCN（Denoising Fully Connected Network）を提案します。世界中の医療施設では、利用できる臨床検査や胸部画像が大きく異なるため、DFCNは入力データの欠落に対して堅牢になるように設計されています。アブレーション研究では、DFCNのパフォーマンス上の利点と、入力の欠落に対する堅牢性を広範囲に評価します。 RT-PCRの結果が確認された1088人の患者からのデータは、2つの独立した医療施設から取得されます。データには、27の臨床検査の結果と、深層学習モデルによってスコアリングされた胸部X線写真が含まれています。トレーニングとテストのデータセットは、さまざまな医療施設から取得されます。データは公開されています。 RT-PCRの結果を予測する際のDFCNのパフォーマンスは、3つの関連するアーキテクチャおよびランダムフォレストベースラインと比較されます。すべてのモデルは、欠落している入力に対する堅牢性を促進するために、さまざまなレベルのマスクされた入力データでトレーニングされています。欠落データは、入力をランダムにマスキングすることにより、テスト時にシミュレートされます。 DFCNは、2〜27の利用可能な入力を持つ入力データのランダムなサブセットを使用して、統計的に有意な他のすべてのモデルよりも優れています。 28の入力すべてが使用可能な場合、DFCNは他のどのモデルよりも高い0.924のAUCを取得します。さらに、それぞれ6つと7つの入力のみで構成される、臨床的に意味のあるパラメーターのサブセットにより、DFCNは、0.909と0.919の値で、他のどのモデルよりも高いAUCを達成します。

In the context of the current global pandemic and the limitations of the RT-PCR test, we propose a novel deep learning architecture, DFCN (Denoising Fully Connected Network). Since medical facilities around the world differ enormously in what laboratory tests or chest imaging may be available, DFCN is designed to be robust to missing input data. An ablation study extensively evaluates the performance benefits of the DFCN as well as its robustness to missing inputs. Data from 1088 patients with confirmed RT-PCR results are obtained from two independent medical facilities. The data includes results from 27 laboratory tests and a chest x-ray scored by a deep learning model. Training and test datasets are taken from different medical facilities. Data is made publicly available. The performance of DFCN in predicting the RT-PCR result is compared with 3 related architectures as well as a Random Forest baseline. All models are trained with varying levels of masked input data to encourage robustness to missing inputs. Missing data is simulated at test time by masking inputs randomly. DFCN outperforms all other models with statistical significance using random subsets of input data with 2-27 available inputs. When all 28 inputs are available DFCN obtains an AUC of 0.924, higher than any other model. Furthermore, with clinically meaningful subsets of parameters consisting of just 6 and 7 inputs respectively, DFCN achieves higher AUCs than any other model, with values of 0.909 and 0.919.

updated: Mon Aug 02 2021 09:59:57 GMT+0000 (UTC)

published: Thu Mar 25 2021 13:21:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト