Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Firas Khader; Jakob Nikolas Kather; Tianyu Han; Sven Nebelung; Christiane Kuhl; Johannes Stegmaier; Daniel Truhn

トランスフォーマーを使用したデータ効率の高いスライド全体画像分類のためのカスケードクロスアテンションネットワーク

Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image Classification Using Transformers

Whole-Slide Imaging により、組織標本の高解像度画像のキャプチャとデジタル化が可能になります。したがって、深層学習モデルを使用したこのような画像の自動分析の需要は高くなります。トランスアーキテクチャは、高解像度情報を効果的に活用するための候補として提案されています。ここでは、スライド全体の画像が小さな画像パッチに分割され、これらの画像パッチから特徴トークンが抽出されます。ただし、従来の変換器では大量の入力トークンのセットを同時に処理できますが、計算需要は入力トークンの数に応じて二次的に増加し、したがって画像パッチの数に応じて二次的に増加します。この問題に対処するために、抽出されたパッチの数に応じて線形にスケールするクロスアテンションメカニズムに基づいた新しいカスケードクロスアテンションネットワーク (CCAN) を提案します。私たちの実験では、このアーキテクチャが 2 つの公開データセットにおける他のアテンションベースの最先端の手法と少なくとも同等、さらにはそれを上回っていることが実証されています。肺がんのユースケース (TCGA NSCLC) では、私たちのモデルは平均値に達しています。受信者動作特性 (AUC) の下の領域は 0.970 ± 0.008 で、腎がん (TCGA RCC) の領域は平均 AUC 0.985 ± 0.004 に達します。さらに、私たちが提案したモデルが低データ領域で効率的であり、リソースが限られた設定でスライド全体の画像を分析するための有望なアプローチとなることを示します。この方向の研究を促進するために、私たちはコードを GitHub: XXX で公開しています。

Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. An automated analysis of such images using deep learning models is therefore of high demand. The transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. Here, the whole-slide image is partitioned into smaller image patches and feature tokens are extracted from these image patches. However, while the conventional transformer allows for a simultaneous processing of a large set of input tokens, the computational demand scales quadratically with the number of input tokens and thus quadratically with the number of image patches. To address this problem we propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches. Our experiments demonstrate that this architecture is at least on-par with and even outperforms other attention-based state-of-the-art methods on two public datasets: On the use-case of lung cancer (TCGA NSCLC) our model reaches a mean area under the receiver operating characteristic (AUC) of 0.970 ± 0.008 and on renal cancer (TCGA RCC) reaches a mean AUC of 0.985 ± 0.004. Furthermore, we show that our proposed model is efficient in low-data regimes, making it a promising approach for analyzing whole-slide images in resource-limited settings. To foster research in this direction, we make our code publicly available on GitHub: XXX.

updated: Thu May 11 2023 16:42:24 GMT+0000 (UTC)

published: Thu May 11 2023 16:42:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト