Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images

Wuyang Chen; Ziyu Jiang; Zhangyang Wang; Kexin Cui; Xiaoning Qian

超高解像度画像のメモリ効率の高いセグメンテーションのための協調的グローバル-ローカルネットワーク

超高解像度画像のセグメンテーションはますます要求されていますが、特に（GPU）メモリ制限を考慮すると、アルゴリズムの効率に重大な課題があります。現在のアプローチでは、超高解像度の画像をダウンサンプリングするか、個別に処理するために小さなパッチにトリミングします。いずれにせよ、ローカルの詳細またはグローバルなコンテキスト情報が失われると、セグメンテーションの精度が制限されます。協調的なグローバルローカルネットワーク（GLNet）を提案し、メモリ効率の高い方法でグローバル情報とローカル情報の両方を効果的に保存します。 GLNetは、グローバルブランチとローカルブランチで構成され、ダウンサンプリングされた画像全体とそのトリミングされたローカルパッチをそれぞれの入力として受け取ります。セグメンテーションの場合、GLNetは2つのブランチからの特徴マップを深く融合し、ズームインされたローカルパッチからの高解像度の微細構造とダウンサンプリングされた入力からのコンテキスト依存性の両方をキャプチャします。バックグラウンド領域とフォアグラウンド領域の間の潜在的なクラスの不均衡の問題をさらに解決するために、メモリ効率の高いGLNetの粗いものから細かいものへのバリアントを提示します。広範な実験と分析が、3つの実世界の超高空中および医用画像データセット（最大3000万ピクセルの解像度）で実行されました。単一の1080TiGPUと2GB未満のメモリを使用するだけで、GLNetは高品質のセグメンテーション結果を生成し、最先端のものと比較してはるかに競争力のある精度とメモリ使用量のトレードオフを実現します。

Segmentation of ultra-high resolution images is increasingly demanded, yet poses significant challenges for algorithm efficiency, in particular considering the (GPU) memory limits. Current approaches either downsample an ultra-high resolution image or crop it into small patches for separate processing. In either way, the loss of local fine details or global contextual information results in limited segmentation accuracy. We propose collaborative Global-Local Networks (GLNet) to effectively preserve both global and local information in a highly memory-efficient manner. GLNet is composed of a global branch and a local branch, taking the downsampled entire image and its cropped local patches as respective inputs. For segmentation, GLNet deeply fuses feature maps from two branches, capturing both the high-resolution fine structures from zoomed-in local patches and the contextual dependency from the downsampled input. To further resolve the potential class imbalance problem between background and foreground regions, we present a coarse-to-fine variant of GLNet, also being memory-efficient. Extensive experiments and analyses have been performed on three real-world ultra-high aerial and medical image datasets (resolution up to 30 million pixels). With only one single 1080Ti GPU and less than 2GB memory used, our GLNet yields high-quality segmentation results and achieves much more competitive accuracy-memory usage trade-offs compared to state-of-the-arts.

updated: Wed Mar 03 2021 17:35:25 GMT+0000 (UTC)

published: Wed May 15 2019 18:22:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト