When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation

Ziyang Wang; Tianze Li; Jian-Qing Zheng; Baoru Huang

CNN が ViT と出会うとき: マルチクラスの医療画像セマンティックセグメンテーションのための半教師あり学習に向けて

医用画像処理コミュニティでは質の高い注釈が不足しているため、画像のセマンティックセグメンテーションタスクでは半教師あり学習法が高く評価されています。この論文では、半教師あり学習でビジョントランスフォーマー（ViT）と畳み込みニューラルネットワーク（CNN）の能力を十分に活用するために、高度な一貫性認識疑似ラベルベースの自己集合アプローチを提示します。私たちが提案するフレームワークは、ViT と CNN によって相互に強化される機能学習モジュールと、一貫性を意識した目的のために堅牢なガイダンスモジュールで構成されます。疑似ラベルは、特徴学習モジュールの CNN と ViT のビューによって繰り返し個別に推論および利用され、データセットが拡張され、相互に有益です。一方,摂動方式は特徴学習モジュール用に設計され,平均化ネットワーク重みを利用してガイダンスモジュールを開発した。そうすることで、フレームワークは CNN と ViT の機能学習の強みを組み合わせ、デュアルビューの共同トレーニングを介してパフォーマンスを強化し、半教師ありの方法で一貫性を意識した監督を可能にします。 CNN と ViT を使用したすべての代替監督モードのトポロジー探索は詳細に検証されており、半教師付き医用画像セグメンテーションタスクでの方法の最も有望なパフォーマンスと特定の設定を示しています。実験結果は、提案された方法が、さまざまなメトリックを持つ公開ベンチマークデータセットで最先端のパフォーマンスを達成することを示しています。コードは公開されています。

Due to the lack of quality annotation in medical imaging community, semi-supervised learning methods are highly valued in image semantic segmentation tasks. In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented to fully utilize the power of Vision Transformer(ViT) and Convolutional Neural Network(CNN) in semi-supervised learning. Our proposed framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes. The pseudo labels are inferred and utilized recurrently and separately by views of CNN and ViT in the feature-learning module to expand the data set and are beneficial to each other. Meanwhile, a perturbation scheme is designed for the feature-learning module, and averaging network weight is utilized to develop the guidance module. By doing so, the framework combines the feature-learning strength of CNN and ViT, strengthens the performance via dual-view co-training, and enables consistency-aware supervision in a semi-supervised manner. A topological exploration of all alternative supervision modes with CNN and ViT are detailed validated, demonstrating the most promising performance and specific setting of our method on semi-supervised medical image segmentation tasks. Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set with a variety of metrics. The code is publicly available.

updated: Thu Feb 08 2024 22:55:52 GMT+0000 (UTC)

published: Fri Aug 12 2022 18:21:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト