Multi-Target Landmark Detection with Incomplete Images via Reinforcement Learning and Shape Prior

Kaiwen Wan; Lei Li; Dengqiang Jia; Shangqi Gao; Wei Qian; Yingzhi Wu; Huandong Lin; Xiongzheng Mu; Xin Gao; Sijia Wang; Fuping Wu; Xiahai Zhuang

強化学習とシェイププライアによる不完全な画像を使用したマルチターゲットランドマーク検出

医用画像は一般に限られた視野 (FOV) で取得されるため、関心領域 (ROI) が不完全になる可能性があるため、医用画像解析に大きな課題が課せられます。これは、学習ベースのマルチターゲットランドマーク検出で特に顕著です。この場合、アルゴリズムは、FOV の変化による背景の変化を主に学習して誤解を招き、ターゲットの検出に失敗する可能性があります。ターゲットを直接予測するのではなく、ナビゲーションポリシーの学習に基づいて、強化学習 (RL) ベースの方法は、効率的な方法でこの課題に取り組む可能性を秘めています。これに触発されて、この作業では、同時マルチターゲットランドマーク検出のためのマルチエージェント RL フレームワークを提案します。このフレームワークは、不完全または（および）完全な画像から学習して、グローバル構造の暗黙の知識を形成することを目的としています。これは、完全または不完全なテスト画像からターゲットを検出するためのトレーニング段階で統合されます。不完全な画像からグローバルな構造情報をさらに明示的に活用するために、形状モデルを RL プロセスに埋め込むことを提案します。この事前知識により、提案された RL モデルは、数十のターゲットを同時にローカライズできるだけでなく、不完全な画像が存在する場合でも効果的かつ確実に機能します。ボディデュアルエネルギーX線吸収測定法（DXA）、心臓MRI、および頭部CTデータセットを使用して、実際の診療所からの不完全な画像を使用して、さまざまなマルチターゲット検出タスクに対する提案された方法の適用性と有効性を検証しました。結果は、私たちの方法が不完全なトレーニング画像で最大80％の欠落率（ボディDXAでの平均距離誤差2.29 cm）のランドマークのセット全体を予測でき、ターゲット画像のFOV外の画像情報が欠落している領域（平均3D ハーフヘッド CT で距離誤差 6.84 mm)。

Medical images are generally acquired with limited field-of-view (FOV), which could lead to incomplete regions of interest (ROI), and thus impose a great challenge on medical image analysis. This is particularly evident for the learning-based multi-target landmark detection, where algorithms could be misleading to learn primarily the variation of background due to the varying FOV, failing the detection of targets. Based on learning a navigation policy, instead of predicting targets directly, reinforcement learning (RL)-based methods have the potential totackle this challenge in an efficient manner. Inspired by this, in this work we propose a multi-agent RL framework for simultaneous multi-target landmark detection. This framework is aimed to learn from incomplete or (and) complete images to form an implicit knowledge of global structure, which is consolidated during the training stage for the detection of targets from either complete or incomplete test images. To further explicitly exploit the global structural information from incomplete images, we propose to embed a shape model into the RL process. With this prior knowledge, the proposed RL model can not only localize dozens of targetssimultaneously, but also work effectively and robustly in the presence of incomplete images. We validated the applicability and efficacy of the proposed method on various multi-target detection tasks with incomplete images from practical clinics, using body dual-energy X-ray absorptiometry (DXA), cardiac MRI and head CT datasets. Results showed that our method could predict whole set of landmarks with incomplete training images up to 80% missing proportion (average distance error 2.29 cm on body DXA), and could detect unseen landmarks in regions with missing image information outside FOV of target images (average distance error 6.84 mm on 3D half-head CT).

updated: Fri Jan 13 2023 05:20:07 GMT+0000 (UTC)

published: Fri Jan 13 2023 05:20:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト