Deep Learning for automatic head and neck lymph node level delineation

Thomas Weissmann; Yixing Huang; Stefan Fischer; Johannes Roesch; Sina Mansoorian; Horacio Ayala Gaona; Antoniu-Oreste Gostian; Markus Hecht; Sebastian Lettmaier; Lisa Deloch; Benjamin Frey; Udo S. Gaipl; Luitpold V. Distel; Andreas Maier; Heinrich Iro; Sabine Semrau; Christoph Bert; Rainer Fietkau; Florian Putz

自動頭頸部リンパ節レベル描写のための深層学習

背景: ディープラーニングに基づく頭頸部リンパ節レベル (HN_LNL) の自動描写は、放射線療法の研究や臨床治療計画との関連性が高いですが、学術文献ではまだ十分に研究されていません。方法: 専門家が作成した 35 の計画 CT のコホートを使用して、20 の異なる HN_LNL の自動セグメンテーション用の nnU-net 3D-fullres/2D アンサンブルモデルをトレーニングしました。検証は、独立したテストセット (n = 20) で実行されました。完全に盲検化された評価では、3 人の臨床専門家が、専門家が作成した輪郭と直接比較して、深層学習の自動セグメンテーションの品質を評価しました。 10 ケースのサブグループについて、観察者内の変動性をディープラーニングの自動セグメンテーションのパフォーマンスと比較しました。幾何学的精度と専門家の評価に対する CT スライス面の向きとの autocontour の一貫性の影響が調査されました。結果: レベルごとの盲検化された専門家の評価の平均は、専門家が作成した等高線よりも CT スライス平面を調整したディープラーニングセグメンテーションの方が有意に優れていました (81.0 対 79.6、p<0.001)。専門家が作成した等高線 (77.2 対 79.6、p<0.001)。深層学習セグメンテーションの幾何学的精度は、観察者内の変動性と変わらず (レベルあたりの平均サイコロ、0.78 対 0.77、p=0.064)、レベル間の精度の分散が改善されました (p<0.001)。 CT スライス面の向きとの輪郭の一貫性の臨床的意義は、幾何学的精度の指標では表されませんでした (ダイス、0.78 対 0.78、p=0.572)研究環境における HN_LNL の大規模な標準化された自動描写に理想的に適した限られたトレーニングデータセットのみを使用した、HN_LNL の非常に正確な自動描写。幾何学的精度の指標は、盲検化された専門家による評価の不完全な代理にすぎません。

Background: Deep learning-based head and neck lymph node level (HN_LNL) autodelineation is of high relevance to radiotherapy research and clinical treatment planning but still understudied in academic literature. Methods: An expert-delineated cohort of 35 planning CTs was used for training of an nnU-net 3D-fullres/2D-ensemble model for autosegmentation of 20 different HN_LNL. Validation was performed in an independent test set (n=20). In a completely blinded evaluation, 3 clinical experts rated the quality of deep learning autosegmentations in a head-to-head comparison with expert-created contours. For a subgroup of 10 cases, intraobserver variability was compared to deep learning autosegmentation performance. The effect of autocontour consistency with CT slice plane orientation on geometric accuracy and expert rating was investigated. Results: Mean blinded expert rating per level was significantly better for deep learning segmentations with CT slice plane adjustment than for expert-created contours (81.0 vs. 79.6, p<0.001), but deep learning segmentations without slice plane adjustment were rated significantly worse than expert-created contours (77.2 vs. 79.6, p<0.001). Geometric accuracy of deep learning segmentations was non-different from intraobserver variability (mean Dice per level, 0.78 vs. 0.77, p=0.064) with variance in accuracy between levels being improved (p<0.001). Clinical significance of contour consistency with CT slice plane orientation was not represented by geometric accuracy metrics (Dice, 0.78 vs. 0.78, p=0.572) Conclusions: We show that a nnU-net 3D-fullres/2D-ensemble model can be used for highly accurate autodelineation of HN_LNL using only a limited training dataset that is ideally suited for large-scale standardized autodelineation of HN_LNL in the research setting. Geometric accuracy metrics are only an imperfect surrogate for blinded expert rating.

updated: Sun Aug 28 2022 13:58:54 GMT+0000 (UTC)

published: Sun Aug 28 2022 13:58:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト