A quality assurance framework for real-time monitoring of deep learning segmentation models in radiotherapy

Xiyao Jin; Yao Hao; Jessica Hilliard; Zhehao Zhang; Maria A. Thomas; Hua Li; Abhinav K. Jha; Geoffrey D. Hugo

放射線治療におけるディープラーニングセグメンテーションモデルのリアルタイムモニタリングのための品質保証フレームワーク

ディープラーニングモデルを診療所に安全に導入するには、入力ドメインのシフトとモデルのパフォーマンスをグランドトゥルースコンターなしで日常的または継続的に監視するための品質保証フレームワークが必要です。この研究では、QA フレームワークを確立するためのタスク例として心臓下部構造のセグメンテーションが使用されました。 241 人の患者のコンピュータ断層撮影 (CT) 画像と手動による心臓描写からなるベンチマークデータセットが収集されました。これには 1 つの「一般的な」画像ドメインと 5 つの「珍しい」ドメインが含まれます。モデルの容量と制限の初期評価のために、セグメンテーションモデルがベンチマークデータセットでテストされました。画像ドメインシフト検出器は、トレーニングされたノイズ除去オートエンコーダー (DAE) と手作業で設計された 2 つの機能を利用して開発されました。別の変分オートエンコーダー (VAE) も、自動セグメンテーション結果の形状品質を推定するようにトレーニングされました。画像とセグメンテーションのペアから抽出された特徴を入力として使用して、ダイス係数類似度 (DSC) によって測定される患者ごとのセグメンテーション精度を予測するために回帰モデルがトレーニングされました。フレームワーク全体の一般化可能性を評価するために、19 のセグメンテーションモデルにわたってフレームワークがテストされました。結果として、回帰モデルの予測 DSC は、0.036 ～ 0.046 の範囲の平均絶対誤差 (MAE) を達成し、平均 MAE は 0.041 でした。ベンチマークデータセットでテストした場合、すべてのセグメンテーションモデルのパフォーマンスは、FOV、スライス厚、再構成カーネルなどのスキャンパラメータによって大きな影響を受けませんでした。ポアソンノイズを含む入力画像の場合、CNN ベースのセグメンテーションモデルでは DSC が 0.07 から 0.41 の範囲で減少することが示されましたが、トランスベースのモデルは大きな影響を受けませんでした。

To safely deploy deep learning models in the clinic, a quality assurance framework is needed for routine or continuous monitoring of input-domain shift and the models' performance without ground truth contours. In this work, cardiac substructure segmentation was used as an example task to establish a QA framework. A benchmark dataset consisting of Computed Tomography (CT) images along with manual cardiac delineations of 241 patients were collected, including one 'common' image domain and five 'uncommon' domains. Segmentation models were tested on the benchmark dataset for an initial evaluation of model capacity and limitations. An image domain shift detector was developed by utilizing a trained Denoising autoencoder (DAE) and two hand-engineered features. Another Variational Autoencoder (VAE) was also trained to estimate the shape quality of the auto-segmentation results. Using the extracted features from the image/segmentation pair as inputs, a regression model was trained to predict the per-patient segmentation accuracy, measured by Dice coefficient similarity (DSC). The framework was tested across 19 segmentation models to evaluate the generalizability of the entire framework. As results, the predicted DSC of regression models achieved a mean absolute error (MAE) ranging from 0.036 to 0.046 with an averaged MAE of 0.041. When tested on the benchmark dataset, the performances of all segmentation models were not significantly affected by scanning parameters: FOV, slice thickness and reconstructions kernels. For input images with Poisson noise, CNN-based segmentation models demonstrated a decreased DSC ranging from 0.07 to 0.41, while the transformer-based model was not significantly affected.

updated: Fri May 19 2023 14:51:05 GMT+0000 (UTC)

published: Fri May 19 2023 14:51:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト