Multi-Label Generalized Zero Shot Learning for the Classification of Disease in Chest Radiographs

Nasir Hayat; Hazem Lashen; Farah E. Shamout

胸部X線写真における疾患の分類のためのマルチラベル一般化ゼロショット学習

胸部X線（CXR）診断における深層神経ネットワークの成功にもかかわらず、教師あり学習では、トレーニング中に見られた疾患クラスの予測しかできません。推論では、これらのネットワークは目に見えない病気のクラスを予測することはできません。新しいクラスを組み込むには、ラベル付けされたデータの収集が必要です。これは、特に発生頻度の低い疾患の場合、簡単な作業ではありません。その結果、考えられるすべての疾患クラスを診断できるモデルを構築することは考えられなくなります。ここでは、CXR画像で複数の目に見える病気と目に見えない病気を同時に予測できるマルチラベル一般化ゼロショット学習（CXR-ML-GZSL）ネットワークを提案します。入力画像が与えられると、CXR-ML-GZSLは、豊富な医療テキストコーパスから抽出された入力の対応するセマンティクスによって導かれる視覚的表現を学習します。この野心的な目標に向けて、新しい学習目標を使用して、視覚的モダリティとセマンティックモダリティの両方を潜在的特徴空間にマッピングすることを提案します。目的は、（i）クエリ画像に最も関連性のあるラベルが無関係なラベルよりも上位にランク付けされること、（ii）ネットワークが潜在的特徴空間のセマンティクスに合わせた視覚的表現を学習すること、および（iii）マップされたセマンティクスを保証すること元のクラス間表現を保持します。ネットワークはエンドツーエンドでトレーニング可能であり、オフライン特徴抽出器の独立した事前トレーニングは必要ありません。 NIH胸部X線データセットでの実験は、再現率、適合率、f1スコア、および受信者動作特性曲線の下の面積の点で、ネットワークが2つの強力なベースラインを上回っていることを示しています。私たちのコードはhttps://github.com/nyuad-cai/CXR-ML-GZSL.gitで公開されています

Despite the success of deep neural networks in chest X-ray (CXR) diagnosis, supervised learning only allows the prediction of disease classes that were seen during training. At inference, these networks cannot predict an unseen disease class. Incorporating a new class requires the collection of labeled data, which is not a trivial task, especially for less frequently-occurring diseases. As a result, it becomes inconceivable to build a model that can diagnose all possible disease classes. Here, we propose a multi-label generalized zero shot learning (CXR-ML-GZSL) network that can simultaneously predict multiple seen and unseen diseases in CXR images. Given an input image, CXR-ML-GZSL learns a visual representation guided by the input's corresponding semantics extracted from a rich medical text corpus. Towards this ambitious goal, we propose to map both visual and semantic modalities to a latent feature space using a novel learning objective. The objective ensures that (i) the most relevant labels for the query image are ranked higher than irrelevant labels, (ii) the network learns a visual representation that is aligned with its semantics in the latent feature space, and (iii) the mapped semantics preserve their original inter-class representation. The network is end-to-end trainable and requires no independent pre-training for the offline feature extractor. Experiments on the NIH Chest X-ray dataset show that our network outperforms two strong baselines in terms of recall, precision, f1 score, and area under the receiver operating characteristic curve. Our code is publicly available at: https://github.com/nyuad-cai/CXR-ML-GZSL.git

updated: Wed Jul 14 2021 09:04:20 GMT+0000 (UTC)

published: Wed Jul 14 2021 09:04:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト