Cyclical Self-Supervision for Semi-Supervised Ejection Fraction Prediction from Echocardiogram Videos

Weihang Dai; Xiaomeng Li; Xinpeng Ding; Kwang-Ting Cheng

心エコー図ビデオからの半教師あり駆出率予測のための周期的自己教師

左心室駆出率 (LVEF) は、心不全の重要な指標です。ビデオから LVEF を推定する既存の方法は、高性能を達成するために大量の注釈付きデータを必要とします。たとえば、10,030 のラベル付き心エコー図ビデオを使用して 4.10 の平均絶対誤差 (MAE) を達成します。ただし、これらのビデオのラベル付けには時間がかかり、潜在的なダウンストリームアプリケーションが他の心臓病に制限されます。この論文では、LVEF 予測のための最初の半教師付きアプローチを紹介します。一般的なビデオ予測タスクとは異なり、LVEF 予測は、特に心エコー図ビデオの左心室 (LV) の変化に関連しています。 LV セグメンテーションの予測から学んだ知識を LVEF 回帰に組み込むことで、より良い予測のためにモデルに追加のコンテキストを提供できます。この目的のために、ビデオベースの LV セグメンテーションを学習するための新しい周期的自己監視 (CSS) メソッドを提案します。次に、セグメンテーションモデルからの予測マスクを LVEF 回帰の追加入力として使用して、LV 領域の空間コンテキストを提供できます。また、教師と生徒の蒸留を導入して、LV セグメンテーションマスクからの情報を、ビデオ入力のみを必要とするエンドツーエンドの LVEF 回帰モデルに蒸留します。結果は、私たちの方法が代替の半教師あり方法よりも優れており、半分の数のラベルを使用して、最先端の教師ありパフォーマンスと競合する 4.17 の MAE を達成できることを示しています。外部データセットでの検証も、私たちの方法を使用することによる一般化能力の向上を示しています。

Left-ventricular ejection fraction (LVEF) is an important indicator of heart failure. Existing methods for LVEF estimation from video require large amounts of annotated data to achieve high performance, e.g. using 10,030 labeled echocardiogram videos to achieve mean absolute error (MAE) of 4.10. Labeling these videos is time-consuming however and limits potential downstream applications to other heart diseases. This paper presents the first semi-supervised approach for LVEF prediction. Unlike general video prediction tasks, LVEF prediction is specifically related to changes in the left ventricle (LV) in echocardiogram videos. By incorporating knowledge learned from predicting LV segmentations into LVEF regression, we can provide additional context to the model for better predictions. To this end, we propose a novel Cyclical Self-Supervision (CSS) method for learning video-based LV segmentation, which is motivated by the observation that the heartbeat is a cyclical process with temporal repetition. Prediction masks from our segmentation model can then be used as additional input for LVEF regression to provide spatial context for the LV region. We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs. Results show our method outperforms alternative semi-supervised methods and can achieve MAE of 4.17, which is competitive with state-of-the-art supervised performance, using half the number of labels. Validation on an external dataset also shows improved generalization ability from using our method.

updated: Thu Oct 20 2022 14:23:40 GMT+0000 (UTC)

published: Thu Oct 20 2022 14:23:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト