The Fully Convolutional Transformer for Medical Image Segmentation

Athanasios Tragakis; Chaitanya Kaul; Roderick Murray-Smith; Dirk Husmeier

医療画像セグメンテーションのための完全畳み込みトランスフォーマー

さまざまなモダリティの医療画像をセグメント化できる新しいトランスモデルを提案します。医療画像分析のきめ細かい性質によってもたらされる課題は、それらの分析のための変圧器の適応がまだ初期段階にあることを意味します。 UNetの圧倒的な成功は、セグメンテーションタスクのきめ細かい性質を理解する能力にありました。これは、既存のトランスベースのモデルには現在ない能力です。この欠点に対処するために、畳み込みニューラルネットワークの実証済みの能力に基づいて効果的な画像表現を学習し、それらをTransformersの能力と組み合わせて、入力の長期的な依存関係を効果的にキャプチャする、Fully Convolutional Transformer（FCT）を提案します。 FCTは、医用画像の文献における最初の完全畳み込みトランスフォーマーモデルです。入力を2段階で処理します。最初に、入力画像から長距離のセマンティック依存関係を抽出することを学習し、次に、特徴から階層的なグローバル属性をキャプチャすることを学習します。 FCTはコンパクトで、正確で、堅牢です。私たちの結果は、事前トレーニングを必要とせずに、さまざまなデータモダリティの複数の医療画像セグメンテーションデータセット全体で、既存のすべてのトランスアーキテクチャを大幅に上回っていることを示しています。 FCTは、ダイスメトリックでACDCデータセットで1.3％、Synapseデータセットで4.4％、Spleenデータセットで1.2％、ISIC 2017データセットで1.1％優れており、パラメータが最大5分の1です。コード、環境、モデルはGitHubから入手できます。

We propose a novel transformer model, capable of segmenting medical images of varying modalities. Challenges posed by the fine grained nature of medical image analysis mean that the adaptation of the transformer for their analysis is still at nascent stages. The overwhelming success of the UNet lay in its ability to appreciate the fine-grained nature of the segmentation task, an ability which existing transformer based models do not currently posses. To address this shortcoming, we propose The Fully Convolutional Transformer (FCT), which builds on the proven ability of Convolutional Neural Networks to learn effective image representations, and combines them with the ability of Transformers to effectively capture long-term dependencies in its inputs. The FCT is the first fully convolutional Transformer model in medical imaging literature. It processes its input in two stages, where first, it learns to extract long range semantic dependencies from the input image, and then learns to capture hierarchical global attributes from the features. FCT is compact, accurate and robust. Our results show that it outperforms all existing transformer architectures by large margins across multiple medical image segmentation datasets of varying data modalities without the need for any pre-training. FCT outperforms its immediate competitor on the ACDC dataset by 1.3%, on the Synapse dataset by 4.4%, on the Spleen dataset by 1.2% and on ISIC 2017 dataset by 1.1% on the dice metric, with up to five times fewer parameters. Our code, environments and models will be available via GitHub.

updated: Wed Jun 01 2022 15:22:41 GMT+0000 (UTC)

published: Wed Jun 01 2022 15:22:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト