TransMed: Transformers Advance Multi-modal Medical Image Classification

Yin Dai; Yifan Gao

TransMed：トランスフォーマーがマルチモーダル医用画像分類を推進

過去10年間で、畳み込みニューラルネットワーク（CNN）は、疾患分類、腫瘍セグメンテーション、病変検出などの医療画像分析タスクで非常に競争力のあるパフォーマンスを示してきました。 CNNには、画像の局所的な特徴を抽出する上で大きな利点があります。ただし、畳み込み演算の局所性のため、長距離の関係をうまく処理できません。最近、トランスフォーマーはコンピュータービジョンに適用され、大規模なデータセットで目覚ましい成功を収めています。自然画像と比較して、マルチモーダル医用画像には明示的で重要な長距離依存性があり、効果的なマルチモーダル融合戦略はディープモデルのパフォーマンスを大幅に向上させることができます。これにより、トランスベースの構造を研究し、それらをマルチモーダル医用画像に適用するように促されます。既存のトランスベースのネットワークアーキテクチャでは、パフォーマンスを向上させるために大規模なデータセットが必要です。ただし、医用画像データセットは比較的小さいため、純粋なトランスフォーマーを医用画像分析に適用することは困難です。したがって、マルチモーダル医用画像分類のためのTransMedを提案します。 TransMedは、CNNとトランスフォーマーの利点を組み合わせて、画像の低レベルの特徴を効率的に抽出し、モダリティ間の長距離依存関係を確立します。耳下腺腫瘍の術前診断の困難な問題についてモデルを評価し、実験結果は提案された方法の利点を示しています。 CNNとトランスフォーマーの組み合わせは、多くの医療画像分析タスクにおいて大きな可能性を秘めていると私たちは主張します。私たちの知る限り、これは医用画像分類にトランスフォーマーを適用する最初の作業です。

Over the past decade, convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks, such as disease classification, tumor segmentation, and lesion detection. CNN has great advantages in extracting local features of images. However, due to the locality of convolution operation, it can not deal with long-range relationships well. Recently, transformers have been applied to computer vision and achieved remarkable success in large-scale datasets. Compared with natural images, multi-modal medical images have explicit and important long-range dependencies, and effective multi-modal fusion strategies can greatly improve the performance of deep models. This prompts us to study transformer-based structures and apply them to multi-modal medical images. Existing transformer-based network architectures require large-scale datasets to achieve better performance. However, medical imaging datasets are relatively small, which makes it difficult to apply pure transformers to medical image analysis. Therefore, we propose TransMed for multi-modal medical image classification. TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images and establish long-range dependencies between modalities. We evaluated our model for the challenging problem of preoperative diagnosis of parotid gland tumors, and the experimental results show the advantages of our proposed method. We argue that the combination of CNN and transformer has tremendous potential in a large number of medical image analysis tasks. To our best knowledge, this is the first work to apply transformers to medical image classification.

updated: Wed Mar 10 2021 08:57:53 GMT+0000 (UTC)

published: Wed Mar 10 2021 08:57:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト