Scaling Up 3D Kernels with Bayesian Frequency Re-parameterization for Medical Image Segmentation

Ho Hin Lee; Quan Liu; Shunxing Bao; Qi Yang; Xin Yu; Leon Y. Cai; Thomas Li; Yuankai Huo; Xenofon Koutsoukos; Bennett A. Landman

医用画像セグメンテーションのためのベイジアン周波数再パラメータ化による 3D カーネルのスケールアップ

ビジョントランスフォーマーに着想を得て、深さ方向の畳み込みの概念が再検討され、医用画像セグメンテーションにラージカーネル (LK) サイズを使用して大きな有効受容野 (ERF) が提供されます。ただし、セグメンテーションのパフォーマンスは、畳み込みニューラルネットワーク (CNN) でカーネルサイズが拡大されると (たとえば、21×21×21)、飽和し、さらには低下する可能性があります。 LK サイズとの畳み込みは、局所性学習の最適な収束を維持するために制限されていると仮定します。 Structural Re-parameterization (SR) は小さなカーネルを並列に使用してローカル収束を強化しますが、最適な小さなカーネルブランチはトレーニングの計算効率を妨げる可能性があります。この作業では、RepUX-Net を提案します。RepUX-Net は、単純な大規模なカーネルブロック設計を備えた純粋な CNN アーキテクチャであり、6 を使用する現在のネットワークの最先端 (SOTA) (たとえば、3D UX-Net、SwinUNETR) と有利に競合します。挑戦的な公開データセット。カーネルの再パラメータ化とカーネル収束のブランチごとの変動との間の同等性を導き出します。人間の視覚系の空間周波数に着想を得て、カーネルの収束を要素ごとの設定に変更し、トレーニング中に畳み込み重みを再パラメーター化する前に空間周波数をベイジアンとしてモデル化するように拡張しました。具体的には、逆数関数を利用して周波数加重値を推定し、確率的勾配降下のために対応するカーネル要素を再スケーリングします。実験結果から、RepUX-Net は、内部検証 (FLARE: 0.929 から 0.944)、外部検証 (MSD: 0.901 から 0.932、KiTS: 0.815 から 0.847、LiTS: 0.933 から 0.949、TCIA: 0.736 から 0.779) で 3D SOTA ベンチマークを一貫して上回っています。 ) および転移学習 (AMOS: 0.880 から 0.911) のシナリオを Dice Score に表示します。

With the inspiration of vision transformers, the concept of depth-wise convolution revisits to provide a large Effective Receptive Field (ERF) using Large Kernel (LK) sizes for medical image segmentation. However, the segmentation performance might be saturated and even degraded as the kernel sizes scaled up (e.g., 21×21×21) in a Convolutional Neural Network (CNN). We hypothesize that convolution with LK sizes is limited to maintain an optimal convergence for locality learning. While Structural Re-parameterization (SR) enhances the local convergence with small kernels in parallel, optimal small kernel branches may hinder the computational efficiency for training. In this work, we propose RepUX-Net, a pure CNN architecture with a simple large kernel block design, which competes favorably with current network state-of-the-art (SOTA) (e.g., 3D UX-Net, SwinUNETR) using 6 challenging public datasets. We derive an equivalency between kernel re-parameterization and the branch-wise variation in kernel convergence. Inspired by the spatial frequency in the human visual system, we extend to vary the kernel convergence into element-wise setting and model the spatial frequency as a Bayesian prior to re-parameterize convolutional weights during training. Specifically, a reciprocal function is leveraged to estimate a frequency-weighted value, which rescales the corresponding kernel element for stochastic gradient descent. From the experimental results, RepUX-Net consistently outperforms 3D SOTA benchmarks with internal validation (FLARE: 0.929 to 0.944), external validation (MSD: 0.901 to 0.932, KiTS: 0.815 to 0.847, LiTS: 0.933 to 0.949, TCIA: 0.736 to 0.779) and transfer learning (AMOS: 0.880 to 0.911) scenarios in Dice Score.

updated: Tue Jun 06 2023 03:05:07 GMT+0000 (UTC)

published: Fri Mar 10 2023 08:38:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト