Semi-Supervised Siamese Network for Identifying Bad Data in Medical Imaging Datasets

Niamh Belton; Aonghus Lawlor; Kathleen M. Curran

医用画像データセットの不良データを特定するための半教師ありシャムネットワーク

医用画像データセットに存在するノイズの多いデータは、実際のデータを処理するために装備された堅牢なモデルの開発に役立つことがよくあります。ただし、不良データに不十分な解剖学的情報が含まれていると、モデルのパフォーマンスに深刻な悪影響を与える可能性があります。半教師ありシャムネットワークを使用して不良データを特定する新しい方法論を提案します。この方法では、主要な解剖学的構造が視野内に存在することを確認するために、専門家ではない人間が「参照」医用画像の小さなプールを確認するだけで済みます。モデルはこの参照セットをトレーニングし、シャムネットワークを使用して参照セットとデータセット内の他のすべての医用画像との間の距離を計算することにより、不良データを識別します。この方法論は、不良データを特定するために0.989の曲線下面積（AUC）を達成します。コードはhttps://git.io/JYFuVで入手できます。

Noisy data present in medical imaging datasets can often aid the development of robust models that are equipped to handle real-world data. However, if the bad data contains insufficient anatomical information, it can have a severe negative effect on the model's performance. We propose a novel methodology using a semi-supervised Siamese network to identify bad data. This method requires only a small pool of 'reference' medical images to be reviewed by a non-expert human to ensure the major anatomical structures are present in the Field of View. The model trains on this reference set and identifies bad data by using the Siamese network to compute the distance between the reference set and all other medical images in the dataset. This methodology achieves an Area Under the Curve (AUC) of 0.989 for identifying bad data. Code will be available at https://git.io/JYFuV.

updated: Mon Aug 16 2021 14:52:20 GMT+0000 (UTC)

published: Mon Aug 16 2021 14:52:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト