MATIS: Masked-Attention Transformers for Surgical Instrument Segmentation

Nicolás Ayobi; Alejandra Pérez-Rondón; Santiago Rodríguez; Pablo Arbeláez

MATIS: 手術器具セグメンテーション用のマスクされた注意トランスフォーマー

手術器具のセグメンテーション (MATIS) 用のマスクされた注意トランスフォーマーを提案します。これは、器具のセグメンテーションに最新のピクセル単位の注意メカニズムを活用する、2 段階の完全にトランスフォーマーベースの方法です。 MATIS は、一連の細かい器具領域提案を生成および分類するマスクされた注意モジュールを採用することにより、タスクのインスタンスレベルの性質を利用します。私たちの方法は、ビデオトランスフォーマーを介して長期的なビデオレベルの情報を組み込み、時間的な一貫性を改善し、マスク分類を強化します。 Endovis 2017 と Endovis 2018 の 2 つの標準的な公開ベンチマークでアプローチを検証します。私たちの実験は、MATIS のフレームごとのベースラインが以前の最先端の方法よりも優れていること、および時間整合性モジュールを含めることでモデルのパフォーマンスがさらに向上することを示しています。

We propose Masked-Attention Transformers for Surgical Instrument Segmentation (MATIS), a two-stage, fully transformer-based method that leverages modern pixel-wise attention mechanisms for instrument segmentation. MATIS exploits the instance-level nature of the task by employing a masked attention module that generates and classifies a set of fine instrument region proposals. Our method incorporates long-term video-level information through video transformers to improve temporal consistency and enhance mask classification. We validate our approach in the two standard public benchmarks, Endovis 2017 and Endovis 2018. Our experiments demonstrate that MATIS' per-frame baseline outperforms previous state-of-the-art methods and that including our temporal consistency module boosts our model's performance further.

updated: Fri Jan 26 2024 04:47:20 GMT+0000 (UTC)

published: Thu Mar 16 2023 17:31:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト