Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

Ruijie Zhang; Qiaozhe Zhang; Yingzhuang Liu; Hao Xin; Yan Liu; Xinggang Wang

スライド画像全体の分類のための Transformer を使用したマルチレベル複数インスタンス学習

全スライド画像 (WSI) は、高解像度でスキャンされた組織画像の一種を指し、コンピュータ支援診断 (CAD) で広く使用されています。解像度が非常に高く、領域レベルのアノテーションの可用性が限られているため、WSI ベースのデジタル診断に深層学習手法を採用することが困難になります。マルチインスタンス学習 (MIL) は、弱いアノテーションの問題に対処するための強力なツールですが、Transformer はビジュアルタスクの分野で大きな成功を収めています。両方を組み合わせることで、深層学習ベースの画像診断に新たな洞察が得られるはずです。ただし、シングルレベル MIL の制限とシーケンス長に対するアテンションメカニズムの制約のため、Transformer を WSI ベースの MIL タスクに直接適用することは現実的ではありません。この問題に取り組むために、トランスフォーマーを使用したマルチレベル MIL (MMIL-Transformer) アプローチを提案します。このアプローチでは、MIL に階層構造を導入することで、多数のインスタンスが関係する MIL タスクを効率的に処理できるようになります。その有効性を検証するために、WSI 分類タスクに関する一連の実験を実施しました。そこでは、MMIL-Transformer が既存の最先端の手法と比較して優れたパフォーマンスを実証しました。私たちが提案したアプローチは、CAMELON16データセットでテストAUC 94.74%とテスト精度93.41%、TCGA-NSCLCデータセットでテストAUC 99.04%とテスト精度94.37%をそれぞれ達成しました。すべてのコードと事前トレーニングされたモデルは、https://github.com/hustvl/MMIL-Transformer から入手できます。

Whole slide image (WSI) refers to a type of high-resolution scanned tissue image, which is extensively employed in computer-assisted diagnosis (CAD). The extremely high resolution and limited availability of region-level annotations make it challenging to employ deep learning methods for WSI-based digital diagnosis. Multiple instance learning (MIL) is a powerful tool to address the weak annotation problem, while Transformer has shown great success in the field of visual tasks. The combination of both should provide new insights for deep learning based image diagnosis. However, due to the limitations of single-level MIL and the attention mechanism's constraints on sequence length, directly applying Transformer to WSI-based MIL tasks is not practical. To tackle this issue, we propose a Multi-level MIL with Transformer (MMIL-Transformer) approach. By introducing a hierarchical structure to MIL, this approach enables efficient handling of MIL tasks that involve a large number of instances. To validate its effectiveness, we conducted a set of experiments on WSIs classification task, where MMIL-Transformer demonstrate superior performance compared to existing state-of-the-art methods. Our proposed approach achieves test AUC 94.74% and test accuracy 93.41% on CAMELYON16 dataset, test AUC 99.04% and test accuracy 94.37% on TCGA-NSCLC dataset, respectively. All code and pre-trained models are available at: https://github.com/hustvl/MMIL-Transformer

updated: Thu Jun 08 2023 08:29:10 GMT+0000 (UTC)

published: Thu Jun 08 2023 08:29:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト