Technical outlier detection via convolutional variational autoencoder for the ADMANI breast mammogram dataset

Hui Li; Carlos A. Pena Solorzano; Susan Wei; Davis J. McCarthy

ADMANI 乳房マンモグラムデータセットの畳み込み変分オートエンコーダーによる技術的異常値の検出

オーストラリアの BreastScreen Victoria が運営する Transforming Breast Cancer Screening with AI プログラム (BRAIx) の ADMANI データセット (注釈付きデジタルマンモグラムおよび関連する非画像データセット) は、複数施設にまたがる大規模な、臨床的に厳選された現実世界のデータベースです。このデータセットは、乳がんの検出、早期診断、その他のアプリケーションのための臨床関連の人工知能 (AI) アルゴリズムの開発に役立つことが期待されています。高いデータ品質を確保するには、下流のアルゴリズム開発の前に技術的な異常値を除去する必要があります。最初のステップとして、30,000 個の個別のマンモグラムをランダムに選択し、ディープ生成ニューラルネットワークである畳み込み変分オートエンコーダー (CVAE) を使用して外れ値を検出します。 CVAE はあらゆる種類の外れ値を検出することが期待されますが、その検出パフォーマンスは外れ値の種類によって異なります。侵食や胸筋解析などの従来の画像処理技術は、特定の異常値タイプにおける CVAE のパフォーマンスの低下を補うことができます。当社は、インプラント、ペースメーカー、心臓ループレコーダー、不適切な X 線撮影、非定型病変/石灰化、不適切な曝露パラメータ、および不適切な配置という 7 種類の技術的外れ値を特定します。テストセットの外れ値再現率は、CVAE、びらん、および胸筋分析のそれぞれが、各検出方法で画像外れ値スコアに従って昇順または降順でランク付けされた上位 1% の画像を選択した場合は 61%、それぞれが上位を選択した場合は 83% になります。 5%の画像。この研究は、ADMANI データセットにおける技術的な外れ値の概要を提供し、外れ値検出の有効性を向上させるための将来の方向性を提案します。

The ADMANI datasets (annotated digital mammograms and associated non-image datasets) from the Transforming Breast Cancer Screening with AI programme (BRAIx) run by BreastScreen Victoria in Australia are multi-centre, large scale, clinically curated, real-world databases. The datasets are expected to aid in the development of clinically relevant Artificial Intelligence (AI) algorithms for breast cancer detection, early diagnosis, and other applications. To ensure high data quality, technical outliers must be removed before any downstream algorithm development. As a first step, we randomly select 30,000 individual mammograms and use Convolutional Variational Autoencoder (CVAE), a deep generative neural network, to detect outliers. CVAE is expected to detect all sorts of outliers, although its detection performance differs among different types of outliers. Traditional image processing techniques such as erosion and pectoral muscle analysis can compensate for the poor performance of CVAE in certain outlier types. We identify seven types of technical outliers: implant, pacemaker, cardiac loop recorder, improper radiography, atypical lesion/calcification, incorrect exposure parameter and improper placement. The outlier recall rate for the test set is 61% if CVAE, erosion and pectoral muscle analysis each select the top 1% images ranked in ascending or descending order according to image outlier score under each detection method, and 83% if each selects the top 5% images. This study offers an overview of technical outliers in the ADMANI dataset and suggests future directions to improve outlier detection effectiveness.

updated: Sat May 20 2023 03:08:42 GMT+0000 (UTC)

published: Sat May 20 2023 03:08:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト