Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

Chengxi Zeng; Xinyu Yang; David Smithard; Majid Mirmehdi; Alberto M Gambaruto; Tilo Burghardt

Video-SwinUNet: VFSS インスタンスセグメンテーションのための時空間ディープラーニングフレームワーク

このホワイトペーパーでは、医療用ビデオセグメンテーションのディープラーニングフレームワークについて説明します。畳み込みニューラルネットワーク (CNN) とトランスフォーマーベースの手法は、その驚くべきセマンティックフィーチャエンコーディングとグローバルな情報理解能力により、医療画像セグメンテーションタスクにおいて大きなマイルストーンを達成しました。ただし、ほとんどの既存のアプローチは、医療ビデオデータの顕著な側面である時間次元を無視しています。提案されたフレームワークは、時間次元全体で隣接するフレームから特徴を明示的に抽出し、それらを時間特徴ブレンダーに組み込みます。これにより、高レベルの時空間特徴がトークン化され、Swin Transformer を介してエンコードされた強力なグローバル特徴が形成されます。最終的なセグメンテーション結果は、UNet のようなエンコーダー/デコーダーアーキテクチャを介して生成されます。私たちのモデルは、他のアプローチよりも大幅に優れており、VFSS2022 データセットのセグメンテーションベンチマークを改善し、テストした 2 つのデータセットで 0.8986 と 0.8186 のサイコロ係数を達成しました。私たちの研究はまた、時間的特徴ブレンディングスキームの有効性と、学習された機能のクロスデータセット転送可能性を示しています。コードとモデルは、https://github.com/SimonZeng7108/Video-SwinUNet で完全に入手できます。

This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.

updated: Wed Feb 22 2023 12:09:39 GMT+0000 (UTC)

published: Wed Feb 22 2023 12:09:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト