Multi-class motion-based semantic segmentation for ureteroscopy and laser lithotripsy

Soumya Gupta; Sharib Ali; Louise Goldsmith; Ben Turney; Jens Rittscher

尿管鏡検査およびレーザー砕石術のためのマルチクラスモーションベースのセマンティックセグメンテーション

腎臓結石は、公的医療制度にとってかなりの負担となります。レーザー砕石術による尿管鏡検査は、腎臓結石の治療に最も一般的に使用される技術として進化してきました。腎臓結石とレーザーファイバーの自動セグメンテーションは、結石の自動定量分析、特に結石サイズの推定を実行するための重要な最初のステップであり、外科医が結石をさらに断片化する必要があるかどうかを判断するのに役立ちます。空洞内の濁った液体、鏡面反射性、腎臓の動きやカメラの動きによるモーションブラー、出血、石の破片などの要因が腎臓内の視力の質に影響を与え、手術時間が長くなります。私たちの知る限り、これは尿管鏡検査とレーザー砕石術データのマルチクラスセグメンテーションに向けて行われた最初の試みです。石とレーザーファイバーのセグメンテーションのためのエンドツーエンドのCNNベースのフレームワークを提案します。提案されたアプローチは、2つのサブネットワークを利用します。HybResUNetは、U-Netのエンコーダパスの残余接続を使用する残余U-Netのバージョンであり、DVFNetは、予測マップの整理に使用されるDVF予測を生成します。また、拡張畳み込み、再発および残余接続、ASPP、注意ゲートを組み合わせたアブレーション研究も紹介します。セグメンテーションのパフォーマンスを向上させる複合損失関数を提案します。また、最適なデータ拡張戦略を決定するためのアブレーション研究も提供しています。私たちの定性的および定量的結果は、提案された方法がUNetやDeepLabv3 +などのSOTA方法よりも優れており、invivoテストデータセットのDSCとJIの合計平均でそれぞれ5.2％と15.93％の改善を示していることを示しています。また、提案されたモデルは、同じメトリックでUNet、HybResUNet、およびDeepLabv3 +よりもそれぞれ25.4％、20％、および11％の平均改善を示す新しい臨床データセットでより一般化されることを示します。

Kidney stones represent a considerable burden for public health-care systems. Ureteroscopy with laser lithotripsy has evolved as the most commonly used technique for the treatment of kidney stones. Automated segmentation of kidney stones and laser fiber is an important initial step to performing any automated quantitative analysis of the stones, particularly stone-size estimation, that helps the surgeon decide if the stone requires more fragmentation. Factors such as turbid fluid inside the cavity, specularities, motion blur due to kidney movements and camera motion, bleeding, and stone debris impact the quality of vision within the kidney and lead to extended operative times. To the best of our knowledge, this is the first attempt made towards multi-class segmentation in ureteroscopy and laser lithotripsy data. We propose an end-to-end CNN-based framework for the segmentation of stones and laser fiber. The proposed approach utilizes two sub-networks: HybResUNet, a version of residual U-Net, that uses residual connections in the encoder path of U-Net and a DVFNet that generates DVF predictions which are then used to prune the prediction maps. We also present ablation studies that combine dilated convolutions, recurrent and residual connections, ASPP and attention gate. We propose a compound loss function that improves our segmentation performance. We have also provided an ablation study to determine the optimal data augmentation strategy. Our qualitative and quantitative results illustrate that our proposed method outperforms SOTA methods such as UNet and DeepLabv3+ showing an improvement of 5.2% and 15.93%, respectively, for the combined mean of DSC and JI in our invivo test dataset. We also show that our proposed model generalizes better on a new clinical dataset showing a mean improvement of 25.4%, 20%, and 11% over UNet, HybResUNet, and DeepLabv3+, respectively, for the same metric.

updated: Fri Apr 02 2021 22:47:21 GMT+0000 (UTC)

published: Fri Apr 02 2021 22:47:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト