Lizard: A Large-Scale Dataset for Colonic Nuclear Instance Segmentation and Classification

Simon Graham; Mostafa Jahanifar; Ayesha Azam; Mohammed Nimir; Yee-Wah Tsang; Katherine Dodd; Emily Hero; Harvir Sahota; Atisha Tank; Ksenija Benes; Noorul Wahab; Fayyaz Minhas; Shan E Ahmed Raza; Hesham El Daly; Kishore Gopalakrishnan; David Snead; Nasir Rajpoot

トカゲ：結腸核インスタンスのセグメンテーションと分類のための大規模データセット

計算病理学（CPath）のディープセグメンテーションモデルの開発は、解釈可能な形態学的バイオマーカーの調査を促進するのに役立ちます。しかし、教師あり深層学習モデルには正確にラベル付けされた大量のデータが必要であるため、このようなアプローチの成功には大きなボトルネックがあります。詳細な注釈の生成には通常、異なる組織構造と核を区別できるようにするために病理医の入力が必要になるため、この問題はCPathの分野で悪化します。核に手動でラベルを付けることは、特に単一の画像領域に数千の異なる細胞を含めることができる場合、大規模な注釈付きデータセットを収集するための実行可能なアプローチではない可能性があります。ただし、注釈の自動生成のみに依存すると、グラウンドトゥルースの精度と信頼性が制限されます。したがって、上記の課題を克服するために、病理医のループ内での改良手順を使用して、組織学画像分析用の大規模なデータセットの収集を可能にする多段階アノテーションパイプラインを提案します。このパイプラインを使用して、H＆E染色された結腸組織に50万近くの標識核を含む、既知の最大の核インスタンスセグメンテーションおよび分類データセットを生成します。データセットをリリースし、研究コミュニティがそれを利用してCPathのダウンストリームセルベースモデルの開発を推進することを奨励しています。

The development of deep segmentation models for computational pathology (CPath) can help foster the investigation of interpretable morphological biomarkers. Yet, there is a major bottleneck in the success of such approaches because supervised deep learning models require an abundance of accurately labelled data. This issue is exacerbated in the field of CPath because the generation of detailed annotations usually demands the input of a pathologist to be able to distinguish between different tissue constructs and nuclei. Manually labelling nuclei may not be a feasible approach for collecting large-scale annotated datasets, especially when a single image region can contain thousands of different cells. However, solely relying on automatic generation of annotations will limit the accuracy and reliability of ground truth. Therefore, to help overcome the above challenges, we propose a multi-stage annotation pipeline to enable the collection of large-scale datasets for histology image analysis, with pathologist-in-the-loop refinement steps. Using this pipeline, we generate the largest known nuclear instance segmentation and classification dataset, containing nearly half a million labelled nuclei in H&E stained colon tissue. We have released the dataset and encourage the research community to utilise it to drive forward the development of downstream cell-based models in CPath.

updated: Mon Nov 29 2021 11:16:00 GMT+0000 (UTC)

published: Wed Aug 25 2021 11:58:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト