Optimizing the Dice Score and Jaccard Index for Medical Image Segmentation: Theory & Practice

Jeroen Bertels; Tom Eelbode; Maxim Berman; Dirk Vandermeulen; Frederik Maes; Raf Bisschops; Matthew Blaschko

医用画像のセグメンテーションのためのDiceスコアとJaccardインデックスの最適化: 理論と実践

DiceスコアとJaccardインデックスは、医用画像のセグメンテーションタスクの評価によく用いられる指標である。画像のセグメンテーションを行うために学習された畳み込みニューラルネットワークは、通常、(重み付き)クロスエントロピーに最適化されている。これは、学習の最適化目的(損失)と最終的な目標指標の間に不利な矛盾をもたらす。最近のコンピュータビジョンの研究では、この矛盾を緩和し、緩和(ソフトDice、ソフトJaccard)や劣モジュラ最適化(Lovász-softmax)により、目的のメトリックを直接最適化するソフトサロゲートが提案されている。本研究の目的は2つ。第1に、リスク最小化の枠組みで理論的な違いを調査し、DiceやJaccardを代用するために理論的に最適化された重みを持つ加重クロスエントロピー損失の存在を問う。第2に、5つの医療用セグメンテーションタスクにおいて、DiceスコアとJaccardインデックスで評価した場合の前述の損失関数の挙動を実証的に調査する。相対的な近似境界を適用することで、すべてのサロゲートは乗数因子までは等価であること、DiceやJaccardの測定値を近似するためのクロスエントロピーの最適な重み付けは存在しないことを示す。これらの結果を経験的に検証し、クロスエントロピーに基づく損失ではなく、ターゲットメトリックのサロゲートのいずれかを選択することが重要であるが、サロゲートの選択は広範囲の医療セグメンテーションタスクにおいて統計的な違いをもたらさないことを示す。

The Dice score and Jaccard index are commonly used metrics for the evaluation of segmentation tasks in medical imaging. Convolutional neural networks trained for image segmentation tasks are usually optimized for (weighted) cross-entropy. This introduces an adverse discrepancy between the learning optimization objective (the loss) and the end target metric. Recent works in computer vision have proposed soft surrogates to alleviate this discrepancy and directly optimize the desired metric, either through relaxations (soft-Dice, soft-Jaccard) or submodular optimization (Lovász-softmax). The aim of this study is two-fold. First, we investigate the theoretical differences in a risk minimization framework and question the existence of a weighted cross-entropy loss with weights theoretically optimized to surrogate Dice or Jaccard. Second, we empirically investigate the behavior of the aforementioned loss functions w.r.t. evaluation with Dice score and Jaccard index on five medical segmentation tasks. Through the application of relative approximation bounds, we show that all surrogates are equivalent up to a multiplicative factor, and that no optimal weighting of cross-entropy exists to approximate Dice or Jaccard measures. We validate these findings empirically and show that, while it is important to opt for one of the target metric surrogates rather than a cross-entropy-based loss, the choice of the surrogate does not make a statistical difference on a wide range of medical segmentation tasks.

updated: Tue Nov 05 2019 09:42:25 GMT+0000 (UTC)

published: Tue Nov 05 2019 09:42:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト