Chest ImaGenome Dataset for Clinical Reasoning

Joy T. Wu; Nkechinyere N. Agu; Ismini Lourentzou; Arjun Sharma; Joseph A. Paguio; Jasper S. Yao; Edward C. Dee; William Mitchell; Satyananda Kashyap; Andrea Giovannini; Leo A. Celi; Mehdi Moradi

臨床的推論のための胸部ImaGenomeデータセット

近年、胸部X線（CXR）画像からの放射線所見の自動検出が進歩しているにもかかわらず、これらのモデルの説明可能性の定量的評価は、さまざまな所見のローカルラベル付きデータセットの欠如によって妨げられています。肺炎や気胸など、特定の所見に関する専門家がラベル付けした小規模なデータセットを除いて、これまでのCXR深層学習モデルのほとんどは、テキストレポートから抽出されたグローバルな「弱い」ラベルでトレーニングされているか、ジョイントを介してトレーニングされています。画像と非構造化テキスト学習戦略。コンピュータービジョンコミュニティでのビジュアルゲノムの取り組みに触発されて、242,072枚の画像を記述するシーングラフデータ構造を備えた最初の胸部ImaGenomeデータセットを構築しました。ローカルアノテーションは、共同ルールベースの自然言語処理（NLP）とアトラスベースのバウンディングボックス検出パイプラインを使用して自動的に生成されます。放射線科医が構築したCXRオントロジーを通じて、各CXRの注釈は、解剖学的中心のシーングラフとして接続され、画像レベルの推論やマルチモーダル融合アプリケーションに役立ちます。全体として、次のものを提供します。i）29のCXR解剖学的位置（バウンディングボックス座標を持つオブジェクト）とその属性の間の関係注釈の1,256の組み合わせ、画像ごとのシーングラフとして構造化、ii）670,000を超えるローカライズされた比較関係（改善、悪化、または変更なし）連続検査全体の解剖学的位置間、およびii）500人の固有の患者からの手動で注釈が付けられたゴールドスタンダードシーングラフデータセット。

Despite the progress in automatic detection of radiologic findings from chest X-ray (CXR) images in recent years, a quantitative evaluation of the explainability of these models is hampered by the lack of locally labeled datasets for different findings. With the exception of a few expert-labeled small-scale datasets for specific findings, such as pneumonia and pneumothorax, most of the CXR deep learning models to date are trained on global "weak" labels extracted from text reports, or trained via a joint image and unstructured text learning strategy. Inspired by the Visual Genome effort in the computer vision community, we constructed the first Chest ImaGenome dataset with a scene graph data structure to describe 242,072 images. Local annotations are automatically produced using a joint rule-based natural language processing (NLP) and atlas-based bounding box detection pipeline. Through a radiologist constructed CXR ontology, the annotations for each CXR are connected as an anatomy-centered scene graph, useful for image-level reasoning and multimodal fusion applications. Overall, we provide: i) 1,256 combinations of relation annotations between 29 CXR anatomical locations (objects with bounding box coordinates) and their attributes, structured as a scene graph per image, ii) over 670,000 localized comparison relations (for improved, worsened, or no change) between the anatomical locations across sequential exams, as well as ii) a manually annotated gold standard scene graph dataset from 500 unique patients.

updated: Sat Jul 31 2021 20:10:30 GMT+0000 (UTC)

published: Sat Jul 31 2021 20:10:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト