PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Weixiong Lin; Ziheng Zhao; Xiaoman Zhang; Chaoyi Wu; Ya Zhang; Yanfeng Wang; Weidi Xie

PMC-CLIP: Biomedical Documents を使用した対照的な言語イメージの事前トレーニング

大規模なデータセットでトレーニングされた Foundation モデルは、CV と NLP で最近急増しています。対照的に、生物医学分野の開発は、データ不足のために大幅に遅れています。この問題に対処するために、PubMedCentral の OpenAccess サブセットから収集された 160 万の画像キャプションペアを含む生物医学データセットである PMC-OA を構築してリリースします。これは以前の 8 倍です。 PMC-OA は、多様なモダリティや疾患をカバーしており、画像キャプションサンプルの大部分は、より細かいレベル、つまりサブフィギュアとサブキャプションに配置されています。 PMC-OA で CLIP スタイルのモデルを事前トレーニングしながら、PMC-CLIP という名前のモデルは、ROCO での画像テキスト検索、MedMNIST 画像分類、Medical VQA、つまり +8.1 など、さまざまなダウンストリームタスクで最先端の結果を達成します。画像テキスト検索で % R@10、画像分類で +3.9% の精度。

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification.

updated: Mon Mar 13 2023 16:13:16 GMT+0000 (UTC)

published: Mon Mar 13 2023 16:13:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト