Natural vs Balanced Distribution in Deep Learning on Whole Slide Images for Cancer Detection

Ismat Ara Reshma; Sylvain Cussat-Blanc; Radu Tudor Ionescu; Hervé Luga; Josiane Mothe

がん検出のためのスライド画像全体の深層学習における自然分布とバランスの取れた分布

データのクラス分布は、機械学習モデルのパフォーマンスを規制する要因の1つです。ただし、文献で利用可能なさまざまな分布の影響に関する調査は非常に少なく、ドメイン固有のタスクではない場合があります。この論文では、全スライド画像（WSI）としても知られる組織学的画像に適用されるディープラーニング（DL）モデルのトレーニングセットの自然でバランスの取れた分布の影響を分析します。 WSIは、がん診断のゴールドスタンダードと見なされています。近年、研究者は診断プロセスを自動化および加速するためにDLモデルに注意を向けています。このようなDLモデルのトレーニングでは、WSIから関心のない領域を除外し、人工的な分布（通常はバランスの取れた分布）を採用するのが一般的な傾向です。私たちの分析では、DLトレーニングのためにWSIデータを通常の分布（自然分布と呼びます）に保つと、人工的に取得されたバランスの取れた分布よりも、同等の偽陰性（FN）を持つ偽陽性（FP）が少なくなることを示します。分布ごとに10のランダムなフォールドを使用して、結果の平均パフォーマンスレベルを5つの異なる評価メトリックの観点から比較する経験的比較研究を実施します。実験結果は、すべての評価指標にわたって、バランスの取れた分布に対する自然分布の有効性を示しています。

The class distribution of data is one of the factors that regulates the performance of machine learning models. However, investigations on the impact of different distributions available in the literature are very few, sometimes absent for domain-specific tasks. In this paper, we analyze the impact of natural and balanced distributions of the training set in deep learning (DL) models applied on histological images, also known as whole slide images (WSIs). WSIs are considered as the gold standard for cancer diagnosis. In recent years, researchers have turned their attention to DL models to automate and accelerate the diagnosis process. In the training of such DL models, filtering out the non-regions-of-interest from the WSIs and adopting an artificial distribution (usually, a balanced distribution) is a common trend. In our analysis, we show that keeping the WSIs data in their usual distribution (which we call natural distribution) for DL training produces fewer false positives (FPs) with comparable false negatives (FNs) than the artificially-obtained balanced distribution. We conduct an empirical comparative study with 10 random folds for each distribution, comparing the resulting average performance levels in terms of five different evaluation metrics. Experimental results show the effectiveness of the natural distribution over the balanced one across all the evaluation metrics.

updated: Mon Dec 21 2020 21:18:49 GMT+0000 (UTC)

published: Mon Dec 21 2020 21:18:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト