EndoDepthL: Lightweight Endoscopic Monocular Depth Estimation with CNN-Transformer

Yangke Li

EndoDepthL: CNN トランスフォーマーを使用した軽量の内視鏡単眼深度推定

この研究では、特にリアルタイム推論と光反射の影響に重点を置き、内視鏡イメージングにおける深度推定の精度と有効性に関する主要な課題に取り組みます。私たちは、畳み込みニューラルネットワーク (CNN) とトランスフォーマーを統合してマルチスケール深度マップを予測する、EndoDepthL という名前の新しい軽量ソリューションを提案します。私たちのアプローチには、ネットワークアーキテクチャの最適化、マルチスケール拡張畳み込み、マルチチャネルアテンションメカニズムの組み込みが含まれます。また、反射領域の影響を最小限に抑えるために、統計的信頼境界マスクも導入します。内視鏡イメージングにおける単眼深度推定のパフォーマンスをより適切に評価するために、ネットワークパラメーターサイズ、浮動小数点演算、および 1 秒あたりの推論フレーム数を考慮した新しい複雑さの評価指標を提案します。提案した手法を包括的に評価し、既存のベースラインソリューションと比較します。この結果は、EndoDepthL が軽量な構造で深度推定精度を確保していることを示しています。

In this study, we address the key challenges concerning the accuracy and effectiveness of depth estimation for endoscopic imaging, with a particular emphasis on real-time inference and the impact of light reflections. We propose a novel lightweight solution named EndoDepthL that integrates Convolutional Neural Networks (CNN) and Transformers to predict multi-scale depth maps. Our approach includes optimizing the network architecture, incorporating multi-scale dilated convolution, and a multi-channel attention mechanism. We also introduce a statistical confidence boundary mask to minimize the impact of reflective areas. To better evaluate the performance of monocular depth estimation in endoscopic imaging, we propose a novel complexity evaluation metric that considers network parameter size, floating-point operations, and inference frames per second. We comprehensively evaluate our proposed method and compare it with existing baseline solutions. The results demonstrate that EndoDepthL ensures depth estimation accuracy with a lightweight structure.

updated: Wed Aug 16 2023 17:39:15 GMT+0000 (UTC)

published: Fri Aug 04 2023 21:38:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト