EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT

Stephen Parsons; C. Seth Parker; Christy Chapman; Mami Hayashida; W. Brent Seales

EduceLab-Scrolls: X 線 CT を使用した Herculaneum Papyri からのテキストの検証可能な復元

X 線 CT 画像を使用して Herculaneum パピルスの隠されたテキストを明らかにするための完全なソフトウェアパイプラインを提示します。この強化された仮想アンラッピングパイプラインは、機械学習と、3D 画像と 2D 画像をリンクする新しい幾何学的フレームワークを組み合わせています。また、この問題に関する 20 年間の研究努力を表す包括的なオープンデータセットである EduceLab-Scrolls も紹介します。 EduceLab-Scrolls には、小さな断片と無傷の巻物両方のボリューム X 線 CT 画像のセットが含まれています。データセットには、インク検出モデルの教師ありトレーニングで使用される 2D 画像ラベルも含まれています。ラベル付けは、スクロールフラグメントのスペクトル写真を同じフラグメントの X 線 CT 画像と位置合わせすることで可能になり、画像空間とモダリティ間の機械学習可能なマッピングが作成されます。この配置により、X 線 CT で「見えない」カーボンインクを検出するための教師あり学習が可能になります。私たちの知る限り、これはこの種の最初の整列されたデータセットであり、遺産ドメインでこれまでにリリースされた最大のデータセットです。私たちの方法は、既知のグラウンドトゥルースを使用して、スクロールフラグメントのテキストの正確な行を明らかにすることができます。明らかにされたテキストは、視覚的確認、定量的な画像測定基準、および学術的レビューを使用して検証されます。 EduceLab-Scrolls は、ここで紹介する Herculaneum パピルスからの隠されたテキストの発見も初めて可能にしました。研究が進むにつれて、EduceLab-Scrolls データセットはより多くのテキストの発見を生み出すと予想しています。

We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.

updated: Sat Apr 08 2023 16:14:46 GMT+0000 (UTC)

published: Tue Apr 04 2023 19:28:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト