CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection

Jie Liu; Yixiao Zhang; Jie-Neng Chen; Junfei Xiao; Yongyi Lu; Bennett A. Landman; Yixuan Yuan; Alan Yuille; Yucheng Tang; Zongwei Zhou

臓器セグメンテーションと腫瘍検出のための CLIP 主導のユニバーサルモデル

ますます多くの公開データセットが、自動化された臓器セグメンテーションと腫瘍検出に顕著な影響を示しています。ただし、各データセットのサイズが小さく部分的にラベル付けされている問題、およびさまざまな種類の腫瘍の調査が限られているため、結果として得られるモデルは特定の臓器/腫瘍のセグメント化に限定されることが多く、解剖学的構造のセマンティクスを無視することもできません。新しいドメインに拡張されます。これらの問題に対処するために、Contrastive Language-Image Pre-training (CLIP) から学習したテキスト埋め込みをセグメンテーションモデルに組み込む、CLIP 駆動型ユニバーサルモデルを提案します。この CLIP ベースのラベルエンコーディングは解剖学的関係をキャプチャし、モデルが構造化された特徴の埋め込みを学習し、25 の臓器と 6 種類の腫瘍をセグメント化できるようにします。提案されたモデルは、トレーニング用に合計 3,410 の CT スキャンを使用して 14 のデータセットのアセンブリから開発され、3 つの追加のデータセットからの 6,162 の外部 CT スキャンで評価されます。当社は、Medical Segmentation Decathlon (MSD) パブリックリーダーボードで第 1 位にランクされ、Beyond The Cranial Vault (BTCV) で最先端の結果を達成しています。さらに、ユニバーサルモデルは、データセット固有のモデルと比較して計算効率が高く (6 倍高速)、さまざまなサイトからの CT スキャンによりよく一般化され、新しいタスクでより強力な転移学習パフォーマンスを示します。

An increasing number of public datasets have shown a marked impact on automated organ segmentation and tumor detection. However, due to the small size and partially labeled problem of each dataset, as well as a limited investigation of diverse types of tumors, the resulting models are often limited to segmenting specific organs/tumors and ignore the semantics of anatomical structures, nor can they be extended to novel domains. To address these issues, we propose the CLIP-Driven Universal Model, which incorporates text embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models. This CLIP-based label encoding captures anatomical relationships, enabling the model to learn a structured feature embedding and segment 25 organs and 6 types of tumors. The proposed model is developed from an assembly of 14 datasets, using a total of 3,410 CT scans for training and then evaluated on 6,162 external CT scans from 3 additional datasets. We rank first on the Medical Segmentation Decathlon (MSD) public leaderboard and achieve state-of-the-art results on Beyond The Cranial Vault (BTCV). Additionally, the Universal Model is computationally more efficient (6x faster) compared with dataset-specific models, generalized better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks.

updated: Thu Aug 17 2023 15:37:32 GMT+0000 (UTC)

published: Mon Jan 02 2023 18:07:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト