Feature Re-calibration based Multiple Instance Learning for Whole Slide Image Classification

Philip Chikontwe; Soo Jeong Nam; Heounjeong Go; Meejeong Kim; Hyun Jung Sung; Sang Hyun Park

スライド全体の画像分類のための機能再キャリブレーションベースのマルチインスタンス学習

全スライド画像（WSI）分類は、病気の診断と治療の基本的なタスクです。ただし、正確なラベルのキュレーションには時間がかかり、完全に監視された方法の適用が制限されます。これに対処するために、マルチインスタンス学習（MIL）は、スライドレベルのラベルのみを使用した弱教師あり学習タスクとして分類を提示する一般的な方法です。現在のMILメソッドは、アテンションメカニズムのバリアントを適用して、より強力なモデルでインスタンスフィーチャを再重み付けしますが、データ分布のプロパティにはほとんど注意が払われていません。この作業では、最大インスタンス（クリティカル）機能の統計を使用して、WSIバッグ（インスタンス）の分布を再調整することを提案します。バイナリMILでは、正のバッグは負のバッグよりも特徴の大きさが大きいと想定します。したがって、正のバッグを分布外としてモデル化するメトリック特徴損失を使用して、バッグ間の不一致を最大化するモデルを適用できます。これを実現するために、シングルバッチトレーニングモードを使用する既存のMILメソッドとは異なり、特徴損失、つまり（+/-）バッグを同時に効果的に使用するためのバランスバッチサンプリングを提案します。さらに、位置エンコーディングモジュール（PEM）を使用して空間/形態情報をモデル化し、Transformerエンコーダーを使用したマルチヘッド自己注意（PSMA）によるプーリングを実行します。既存のベンチマークデータセットでの実験結果は、私たちのアプローチが効果的であり、最先端のMILメソッドよりも優れていることを示しています。

Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases; but, curation of accurate labels is time-consuming and limits the application of fully-supervised methods. To address this, multiple instance learning (MIL) is a popular method that poses classification as a weakly supervised learning task with slide-level labels only. While current MIL methods apply variants of the attention mechanism to re-weight instance features with stronger models, scant attention is paid to the properties of the data distribution. In this work, we propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature. We assume that in binary MIL, positive bags have larger feature magnitudes than negatives, thus we can enforce the model to maximize the discrepancy between bags with a metric feature loss that models positive bags as out-of-distribution. To achieve this, unlike existing MIL methods that use single-batch training modes, we propose balanced-batch sampling to effectively use the feature loss i.e., (+/-) bags simultaneously. Further, we employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder. Experimental results on existing benchmark datasets show our approach is effective and improves over state-of-the-art MIL methods.

updated: Fri Jul 22 2022 01:25:53 GMT+0000 (UTC)

published: Wed Jun 22 2022 07:00:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト