Towards a Visual-Language Foundation Model for Computational Pathology

Ming Y. Lu; Bowen Chen; Drew F. K. Williamson; Richard J. Chen; Ivy Liang; Tong Ding; Guillaume Jaume; Igor Odintsov; Andrew Zhang; Long Phi Le; Georg Gerber; Anil V Parwani; Faisal Mahmood

計算病理学のための視覚言語基盤モデルに向けて

デジタル病理学の導入の加速と深層学習の進歩により、さまざまな疾患や患者コホートにわたるさまざまな病理学タスク用の強力なモデルの開発が可能になりました。ただし、医療分野ではラベルが不足しているため、モデルのトレーニングが困難な場合が多く、モデルの使用はトレーニング対象の特定のタスクや疾患によって制限されます。さらに、病理組織学におけるほとんどのモデルは画像データのみを利用しており、人間が互いに教え合い、病理組織学的実体について推論する方法とはまったく対照的です。我々は、組織病理学画像、生物医学テキスト、特にタスクに依存しない事前トレーニングを通じて 117 万を超える画像とキャプションのペアのさまざまなソースを使用して開発された視覚言語基礎モデルである Captions for Histopathology (CONCH) からの CONtrastive 学習を紹介します。 13 の多様なベンチマークスイートで評価された CONCH は、病理組織画像とテキストのいずれかまたは両方を含む幅広い下流タスクに転送でき、組織画像の分類、セグメンテーション、キャプション、テキストから画像、および画像からテキストの検索において最先端のパフォーマンスを実現します。 CONCH は、組織病理学向けの同時視覚言語による事前トレーニング済みシステムを大幅に飛躍させており、最小限の、またはそれ以上の監視付き微調整を必要としない、機械学習ベースの幅広いワークフローを直接促進できる可能性があります。

The accelerated adoption of digital pathology and advances in deep learning have enabled the development of powerful models for various pathology tasks across a diverse array of diseases and patient cohorts. However, model training is often difficult due to label scarcity in the medical domain and the model's usage is limited by the specific task and disease for which it is trained. Additionally, most models in histopathology leverage only image data, a stark contrast to how humans teach each other and reason about histopathologic entities. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and notably over 1.17 million image-caption pairs via task-agnostic pretraining. Evaluated on a suite of 13 diverse benchmarks, CONCH can be transferred to a wide range of downstream tasks involving either or both histopathology images and text, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval. CONCH represents a substantial leap over concurrent visual-language pretrained systems for histopathology, with the potential to directly facilitate a wide array of machine learning-based workflows requiring minimal or no further supervised fine-tuning.

updated: Tue Jul 25 2023 17:56:38 GMT+0000 (UTC)

published: Mon Jul 24 2023 16:13:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト