Towards Hierarchical Regional Transformer-based Multiple Instance Learning

Josef Cersovsky; Sadegh Mohammadi; Dagmar Kainmueller; Johannes Hoehne

階層型リージョントランスフォーマーベースの複数インスタンス学習に向けて

ディープマルチインスタンスラーニングモデルを使用したギガピクセルの病理組織画像の分類は、デジタル病理学および高精度医療において重要なタスクとなっています。この研究では、従来の学習された注意メカニズムを地域の Vision Transformer からインスピレーションを得た自己注意メカニズムに置き換える、Transformer ベースの複数インスタンス学習アプローチを提案します。領域パッチ情報を融合してスライドレベルの予測を導き出す方法を提示し、この領域集約を積み重ねてさまざまな距離レベルの特徴を階層的に処理する方法を示します。特に小さな局所的な形態的特徴を持つデータセットの予測精度を高めるために、推論中に注目度の高い領域に画像処理を集中させる方法を導入します。私たちのアプローチは、2 つの組織病理学データセットのベースラインを超えてパフォーマンスを大幅に向上させることができ、さらなる研究の有望な方向性を示しています。

The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.

updated: Mon Nov 20 2023 10:06:03 GMT+0000 (UTC)

published: Thu Aug 24 2023 08:19:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト