Guided Patch-Grouping Wavelet Transformer with Spatial Congruence for Ultra-High Resolution Segmentation

Deyi Ji; Feng Zhao; Hongtao Lu

超高解像度セグメンテーションのための空間的一致を備えたガイド付きパッチグループ化ウェーブレット変換器

既存の超高解像度 (UHR) セグメンテーション手法のほとんどは、メモリコストと局所特性評価精度のバランスをとるというジレンマに常に悩まされていますが、当社が提案する Guided Patch-Grouping Wavelet Transformer (GPWFormer) ではこれらの両方が考慮されており、優れたパフォーマンスを実現しています。この研究では、GPWFormer は Transformer (T)-CNN (C) 相互学習フレームワークであり、T は UHR 画像全体を入力として受け取り、局所的な詳細と詳細な長距離コンテキスト依存関係の両方を収集しますが、C はダウンサンプリングされた画像をカテゴリごとの深いコンテキストを学習するための入力。高い推論速度と低い計算複雑性を実現するために、T は元の UHR イメージをパッチに分割して動的にグループ化し、軽量のマルチヘッドウェーブレットトランスフォーマー (WFormer) ネットワークを使用して低レベルのローカルの詳細を学習します。一方、空間領域内で遠く離れたパッチも同じグループに割り当てることができるため、このプロセスでは、きめの細かい長距離のコンテキスト依存関係もキャプチャされます。さらに、C によって生成されたマスクは、パッチのグループ化プロセスをガイドするために利用され、ヒューリスティックな決定を提供します。さらに、2 つのブランチ間の合同制約もパッチ間の空間的一貫性を維持するために利用されます。全体として、多段階のプロセスをピラミッド状に積み重ねます。実験の結果、GPWFormer は 5 つのベンチマークデータセットで大幅な改善が見られ、既存の手法よりも優れたパフォーマンスを発揮することがわかりました。

Most existing ultra-high resolution (UHR) segmentation methods always struggle in the dilemma of balancing memory cost and local characterization accuracy, which are both taken into account in our proposed Guided Patch-Grouping Wavelet Transformer (GPWFormer) that achieves impressive performances. In this work, GPWFormer is a Transformer (T)-CNN (C) mutual leaning framework, where T takes the whole UHR image as input and harvests both local details and fine-grained long-range contextual dependencies, while C takes downsampled image as input for learning the category-wise deep context. For the sake of high inference speed and low computation complexity, T partitions the original UHR image into patches and groups them dynamically, then learns the low-level local details with the lightweight multi-head Wavelet Transformer (WFormer) network. Meanwhile, the fine-grained long-range contextual dependencies are also captured during this process, since patches that are far away in the spatial domain can also be assigned to the same group. In addition, masks produced by C are utilized to guide the patch grouping process, providing a heuristics decision. Moreover, the congruence constraints between the two branches are also exploited to maintain the spatial consistency among the patches. Overall, we stack the multi-stage process in a pyramid way. Experiments show that GPWFormer outperforms the existing methods with significant improvements on five benchmark datasets.

updated: Thu Jul 06 2023 02:54:16 GMT+0000 (UTC)

published: Mon Jul 03 2023 02:19:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト