Token Sparsification for Faster Medical Image Segmentation

Lei Zhou; Huidong Liu; Joseph Bae; Junjun He; Dimitris Samaras; Prateek Prasanna

医用画像のセグメンテーションを高速化するためのトークンのスパース化

セグメンテーションなどの密な予測にスパーストークンを使用できますか?トークンのスパース化はビジョントランスフォーマー (ViT) に適用されて分類を高速化していますが、スパーストークンからセグメンテーションを実行する方法はまだ不明です。この目的のために、セグメンテーションをスパースエンコーディング -> トークン完了 -> デンスデコーディング (SCD) パイプラインとして再定式化します。最初に、分類トークンのプルーニングとマスクされた画像モデリング (MIM) からの既存のアプローチを素朴に適用すると、不適切なサンプリングアルゴリズムと復元された密な特徴の低品質によって引き起こされる失敗と非効率的なトレーニングにつながることを経験的に示します。このホワイトペーパーでは、これらの問題に対処するために、Soft-topK Token Pruning (STP) と Multi-layer Token Assembly (MTA) を提案します。スパースエンコーディングでは、STP は軽量サブネットワークを使用してトークンの重要度スコアを予測し、topK トークンをサンプリングします。扱いにくい topK 勾配は、連続摂動スコア分布によって近似されます。トークン補完では、MTA は、まばらな出力トークンとプルーニングされた多層中間トークンの両方を組み立てることによって、完全なトークンシーケンスを復元します。最後の高密度デコード段階は、UNETR などの既存のセグメンテーションデコーダと互換性があります。実験によると、STP と MTA を備えた SCD パイプラインは、セグメンテーションの品質を維持しながら、両方のトレーニング (最大 120% 高いスループットと最大 60.6% 高いスループットの推論) でトークンプルーニングを使用しないベースラインよりもはるかに高速です。

Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding -> token completion -> dense decoding (SCD) pipeline. We first empirically show that naively applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput and inference up to 60.6% higher throughput) while maintaining segmentation quality.

updated: Sat Mar 11 2023 23:59:13 GMT+0000 (UTC)

published: Sat Mar 11 2023 23:59:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト