Pretrained ViTs Yield Versatile Representations For Medical Images

Christos Matsoukas; Johan Fredin Haslum; Magnus Söderberg; Kevin Smith

事前トレーニング済みの ViT は、医療画像の多様な表現を生成します

畳み込みニューラルネットワーク (CNN) は、自動化された医用画像診断への事実上のアプローチとして 10 年間君臨しており、分類、検出、およびセグメンテーションタスクの最先端を推進しています。ここ数年、ビジョントランスフォーマー (ViT) は、CNN に代わる競争力のある代替手段として登場しており、自然画像ドメインで印象的なレベルのパフォーマンスを実現すると同時に、医療画像処理タスクに有益であることが証明されるいくつかの興味深い特性を備えています。この作業では、医用画像分類のための変換器ベースのモデルの利点と欠点を探ります。いくつかの標準的な 2D 医用画像ベンチマークデータセットとタスクに対して一連の実験を行います。私たちの調査結果によると、CNN はゼロからトレーニングした方がパフォーマンスが向上しますが、市販のビジョントランスフォーマーは、教師あり設定と自己教師あり設定の両方で、ImageNet で事前トレーニングすると CNN と同等のパフォーマンスを発揮し、CNN の実行可能な代替手段としてレンダリングできることが示されています。 .

Convolutional Neural Networks (CNNs) have reigned for a decade as the de facto approach to automated medical image diagnosis, pushing the state-of-the-art in classification, detection and segmentation tasks. Over the last years, vision transformers (ViTs) have appeared as a competitive alternative to CNNs, yielding impressive levels of performance in the natural image domain, while possessing several interesting properties that could prove beneficial for medical imaging tasks. In this work, we explore the benefits and drawbacks of transformer-based models for medical image classification. We conduct a series of experiments on several standard 2D medical image benchmark datasets and tasks. Our findings show that, while CNNs perform better if trained from scratch, off-the-shelf vision transformers can perform on par with CNNs when pretrained on ImageNet, both in a supervised and self-supervised setting, rendering them as a viable alternative to CNNs.

updated: Mon Mar 13 2023 11:53:40 GMT+0000 (UTC)

published: Mon Mar 13 2023 11:53:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト