Feature-Conditioned Cascaded Video Diffusion Models for Precise Echocardiogram Synthesis

Hadrien Reynaud; Mengyun Qiao; Mischa Dombrowski; Thomas Day; Reza Razavi; Alberto Gomez; Paul Leeson; Bernhard Kainz

正確な心エコー図合成のための特徴条件付きカスケードビデオ拡散モデル

画像合成は、機械学習手法を臨床診療に応用する価値を提供すると期待されています。モデルのロバスト性、ドメイン転送、因果モデリング、オペレーターのトレーニングなどの基本的な問題は、合成データを通じてアプローチ可能になります。特に、超音波イメージングのようにオペレーターに大きく依存するモダリティでは、画像とビデオの生成に堅牢なフレームワークが必要です。これまで、ビデオ生成は、出力データと同じくらい豊富な入力データを提供することによってのみ可能でした。たとえば、画像シーケンスと調整入力、ビデオ出力などです。ただし、臨床文書は通常不足しており、単一の画像のみが報告および保存されるため、現在のアプローチでは遡及的な患者固有の分析や豊富なトレーニングデータの生成は不可能になります。この論文では、ビデオモデリングの解明された拡散モデルを拡張して、単一の画像と臨床パラメーターを使用した任意の条件付けから妥当なビデオシーケンスを生成します。これらの検査から得られた最も重要な臨床指標である左心室駆出率の変動を調べることにより、心エコー図のコンテキスト内でこのアイデアを探ります。すべての実験には、公開されている EchoNet-Dynamic データセットを使用します。画像からシーケンスへのアプローチは、93% の R^2 スコアを達成します。これは、最近提案されたシーケンスからシーケンスへの生成方法よりも 38 ポイント高い値です。コードとモデルは、https://github.com/HReynaud/EchoDiffusion で入手できます。

Image synthesis is expected to provide value for the translation of machine learning methods into clinical practice. Fundamental problems like model robustness, domain transfer, causal modelling, and operator training become approachable through synthetic data. Especially, heavily operator-dependant modalities like Ultrasound imaging require robust frameworks for image and video generation. So far, video generation has only been possible by providing input data that is as rich as the output data, e.g., image sequence plus conditioning in, video out. However, clinical documentation is usually scarce and only single images are reported and stored, thus retrospective patient-specific analysis or the generation of rich training data becomes impossible with current approaches. In this paper, we extend elucidated diffusion models for video modelling to generate plausible video sequences from single images and arbitrary conditioning with clinical parameters. We explore this idea within the context of echocardiograms by looking into the variation of the Left Ventricle Ejection Fraction, the most essential clinical metric gained from these examinations. We use the publicly available EchoNet-Dynamic dataset for all our experiments. Our image to sequence approach achieves an R^2 score of 93%, which is 38 points higher than recently proposed sequence to sequence generation methods. Code and models will be available at: https://github.com/HReynaud/EchoDiffusion.

updated: Thu Mar 23 2023 09:17:22 GMT+0000 (UTC)

published: Wed Mar 22 2023 15:26:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト