Extraction of Text from Optic Nerve Optical Coherence Tomography Reports

Iyad Majid; Youchen Victor Zhang; Robert Chang; Sophia Y. Wang

視神経光コヒーレンス断層撮影レポートからのテキストの抽出

目的: この研究の目的は、Zeiss Cirrus 光コヒーレンストモグラフィー ( OCT) スキャンレポート。方法: 文書タイトルに RNFL または神経節細胞を含むカプセル化された PDF レポートを含む DICOM ファイルが、単一の学術眼科センターの臨床画像リポジトリから特定されました。次に、PDF レポートは画像ファイルに変換され、光学式文字認識用の PaddleOCR Python パッケージを使用して処理されました。ルールベースのアルゴリズムが設計され、RNFL および GCC データを抽出する際のパフォーマンスを向上させるために繰り返し最適化されました。アルゴリズムの評価は、一連の RNFL レポートと GCC レポートを手動でレビューすることで実施されました。結果: 開発されたアルゴリズムは、RNFL スキャンと GCC スキャンの両方からデータを抽出する際に高い精度を示しました。精度は、RNFL 抽出では右目 (OD: 0.9803 対 OS: 0.9046)、GCC 抽出では左目 (OD: 0.9567 対 OS: 0.9677) でわずかに優れていました。一部の値では、特に RNFL の厚さの 5 時間目と 6 時間目、GCC の信号強度など、抽出においてより多くの課題が発生しました。結論: カスタマイズされた光学式文字認識アルゴリズムは、光学式コヒーレンススキャンレポートからの数値結果を高精度で識別できます。 PDF レポートの自動処理により、大規模な OCT 結果を抽出する時間を大幅に短縮できます。

Purpose: The purpose of this study was to develop and evaluate rule-based algorithms to enhance the extraction of text data, including retinal nerve fiber layer (RNFL) values and other ganglion cell count (GCC) data, from Zeiss Cirrus optical coherence tomography (OCT) scan reports. Methods: DICOM files that contained encapsulated PDF reports with RNFL or Ganglion Cell in their document titles were identified from a clinical imaging repository at a single academic ophthalmic center. PDF reports were then converted into image files and processed using the PaddleOCR Python package for optical character recognition. Rule-based algorithms were designed and iteratively optimized for improved performance in extracting RNFL and GCC data. Evaluation of the algorithms was conducted through manual review of a set of RNFL and GCC reports. Results: The developed algorithms demonstrated high precision in extracting data from both RNFL and GCC scans. Precision was slightly better for the right eye in RNFL extraction (OD: 0.9803 vs. OS: 0.9046), and for the left eye in GCC extraction (OD: 0.9567 vs. OS: 0.9677). Some values presented more challenges in extraction, particularly clock hours 5 and 6 for RNFL thickness, and signal strength for GCC. Conclusions: A customized optical character recognition algorithm can identify numeric results from optical coherence scan reports with high precision. Automated processing of PDF reports can greatly reduce the time to extract OCT results on a large scale.

updated: Mon Aug 21 2023 15:34:32 GMT+0000 (UTC)

published: Mon Aug 21 2023 15:34:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト