A geometry-aware deep network for depth estimation in monocular endoscopy

Yongming Yang; Shuwei Shao; Tao Yang; Peng Wang; Zhuo Yang; Chengdong Wu; Hao Liu

単眼内視鏡検査における深さ推定のためのジオメトリ認識ディープネットワーク

単眼深度推定は、内視鏡医が手術部位の空間認識と 3D ナビゲーションを実行するために重要です。ただし、既存の方法のほとんどは重要な幾何学的構造の一貫性を無視しており、これは必然的にパフォーマンスの低下と 3D 再構成の歪みにつながります。この問題に対処するために、階段状のエッジ構造の周囲にあいまいなエッジの変動にペナルティを課す勾配損失と、頻繁に小さい構造に対する感度を明示的に表現する通常の損失を導入し、サンプルグリッド全体に空間情報を広げて制約する幾何学的一貫性損失を提案します。グローバルな幾何学的解剖構造。さらに、反射や照明の変化の下で解剖学的構造をキャプチャする合成 RGB 深度データセットを開発します。提案された方法は、さまざまなデータセットと臨床画像にわたって広く検証され、EndoSLAM データセットで 0.066 (胃)、0.029 (小腸)、および 0.139 (結腸) の平均 RMSE 値を達成します。提案された方法の一般化可能性は、ColonDepth データセットで 12.604 (T1-L1)、9.930 (T2-L2)、および 13.893 (T3-L3) の平均 RMSE 値を達成します。実験結果は、私たちの方法が以前の最先端の競合他社を上回り、より一貫した深度マップと合理的な解剖学的構造を生成することを示しています。提案された方法の内視鏡ビデオからの術中 3D 構造知覚の品質は、内視鏡ナビゲーション用のビデオ CT レジストレーションアルゴリズムの精度要件を満たしています。データセットとソースコードは、https://github.com/YYM-SIA/LINGMI-MR で入手できます。

Monocular depth estimation is critical for endoscopists to perform spatial perception and 3D navigation of surgical sites. However, most of the existing methods ignore the important geometric structural consistency, which inevitably leads to performance degradation and distortion of 3D reconstruction. To address this issue, we introduce a gradient loss to penalize edge fluctuations ambiguous around stepped edge structures and a normal loss to explicitly express the sensitivity to frequently small structures, and propose a geometric consistency loss to spreads the spatial information across the sample grids to constrain the global geometric anatomy structures. In addition, we develop a synthetic RGB-Depth dataset that captures the anatomical structures under reflections and illumination variations. The proposed method is extensively validated across different datasets and clinical images and achieves mean RMSE values of 0.066 (stomach), 0.029 (small intestine), and 0.139 (colon) on the EndoSLAM dataset. The generalizability of the proposed method achieves mean RMSE values of 12.604 (T1-L1), 9.930 (T2-L2), and 13.893 (T3-L3) on the ColonDepth dataset. The experimental results show that our method exceeds previous state-of-the-art competitors and generates more consistent depth maps and reasonable anatomical structures. The quality of intraoperative 3D structure perception from endoscopic videos of the proposed method meets the accuracy requirements of video-CT registration algorithms for endoscopic navigation. The dataset and the source code will be available at https://github.com/YYM-SIA/LINGMI-MR.

updated: Thu Apr 20 2023 11:59:32 GMT+0000 (UTC)

published: Thu Apr 20 2023 11:59:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト