Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

Tianyuan Yao; Chang Qu; Jun Long; Quan Liu; Ruining Deng; Yuanhan Tian; Jiachen Xu; Aadarsh Jha; Zuhayr Asad; Shunxing Bao; Mengyang Zhao; Agnes B. Fogo; Bennett A. Landman; Haichun Yang; Catie Chang; Yuankai Huo

生物医学画像の複式図分離: 自己教師あり学習のための大規模データセットのマイニング

自己教師あり学習 (例: 対照学習) の急速な発展に伴い、より一般化可能な AI モデルをトレーニングするために大規模な画像 (注釈なしでも) を持つことの重要性が、医用画像分析で広く認識されています。ただし、大規模なタスク固有の注釈なしデータを大規模に収集することは、個々のラボにとって困難な場合があります。デジタルブック、出版物、検索エンジンなどの既存のオンラインリソースは、大規模な画像を取得するための新しいリソースを提供します。ただし、ヘルスケア (放射線学や病理学など) で公開されている画像は、サブプロットを含むかなりの量の化合物図で構成されています。複合図形を抽出して、下流の学習に使用できる個々の画像に分離するために、従来必要とされていた検出境界ボックスの注釈を使用せずに、新しい損失関数とハードケースシミュレーションを使用する単純な複合図形分離 (SimCFS) フレームワークを提案します。私たちの技術的貢献は 4 つあります。(1) シミュレーションベースのトレーニングフレームワークを導入し、リソースを大量に使用するバウンディングボックスの注釈の必要性を最小限に抑えます。 (2) 複合図形分離に最適化された新しいサイドロスを提案します。（3）ハードケースをシミュレートするためのクラス内画像拡張方法を提案します。 (4) 私たちの知る限りでは、これは複合画像分離による自己教師あり学習の有効性を評価する最初の研究です。結果から、提案された SimCFS は、ImageCLEF 2016 化合物図形分離データベースで最先端のパフォーマンスを達成しました。大規模なマイニングされた数値を使用した事前トレーニング済みの自己教師あり学習モデルは、対照的な学習アルゴリズムを使用して、下流の画像分類タスクの精度を向上させました。 SimCFS のソースコードは、https://github.com/hrlblab/ImageSeperation で公開されています。

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with a new loss function and a hard case simulation. Our technical contribution is four-fold: (1) we introduce a simulation-based training framework that minimizes the need for resource extensive bounding box annotations; (2) we propose a new side loss that is optimized for compound figure separation; (3) we propose an intra-class image augmentation method to simulate hard cases; and (4) to the best of our knowledge, this is the first study that evaluates the efficacy of leveraging self-supervised learning with compound image separation. From the results, the proposed SimCFS achieved state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The pretrained self-supervised learning model using large-scale mined figures improved the accuracy of downstream image classification tasks with a contrastive learning algorithm. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.

updated: Tue Aug 30 2022 16:02:34 GMT+0000 (UTC)

published: Tue Aug 30 2022 16:02:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト