A new sampling methodology for creating rich, heterogeneous, subsets of samples for training image segmentation algorithms

Matheus Viana da Silva; Natália de Carvalho Santos; Baptiste Lacoste; Cesar Henrique Comin

画像セグメンテーションアルゴリズムをトレーニングするための豊富で異種のサンプルのサブセットを作成するための新しいサンプリング手法

教師あり機械学習アルゴリズムをトレーニングするためのデータセットを作成することは、困難な作業になる場合があります。これは特に医療画像のセグメンテーションに当てはまります。このタスクには通常、画像注釈の専門家が 1 人以上必要であり、1 つの画像だけのグラウンドトゥルースラベルを作成するのに数時間かかる場合があるためです。さらに、注釈付きのサンプルが、画像取得プロセスの変化の可能性だけでなく、画像化された組織に影響を与える可能性のあるさまざまな条件をよく表すことが最も重要です。これは、データセット内の典型的なサンプルだけでなく、非典型的なサンプル、さらには外れ値のサンプルを考慮することによってのみ達成できます。原型サンプルと非定型サンプルの両方を均等に考慮する方法で、より大きな注釈なしのデータセットから関連する画像を選択するための新しいサンプリング方法を紹介します。この方法論には、サンプルを表す特徴空間からの均一なグリッドの生成が含まれます。これは、関連する画像をランダムに描画するために使用されます。選択された画像は、元のデータセットの均一なカバーを提供するため、教師付きセグメンテーションアルゴリズムのトレーニングに注釈を付けて使用できる異種の画像セットを定義します。数千の画像を含むより大きなデータセットから選択された血管顕微鏡画像の代表的なセットを含むデータセットを作成することにより、ケース例を提供します。

Creating a dataset for training supervised machine learning algorithms can be a demanding task. This is especially true for medical image segmentation since this task usually requires one or more specialists for image annotation, and creating ground truth labels for just a single image can take up to several hours. In addition, it is paramount that the annotated samples represent well the different conditions that might affect the imaged tissue as well as possible changes in the image acquisition process. This can only be achieved by considering samples that are typical in the dataset as well as atypical, or even outlier, samples. We introduce a new sampling methodology for selecting relevant images from a larger non-annotated dataset in a way that evenly considers both prototypical as well as atypical samples. The methodology involves the generation of a uniform grid from a feature space representing the samples, which is then used for randomly drawing relevant images. The selected images provide a uniform cover of the original dataset, and thus define a heterogeneous set of images that can be annotated and used for training supervised segmentation algorithms. We provide a case example by creating a dataset containing a representative set of blood vessel microscopy images selected from a larger dataset containing thousands of images.

updated: Wed Jan 11 2023 15:31:15 GMT+0000 (UTC)

published: Wed Jan 11 2023 15:31:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト