SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective

An Wang; Mobarakol Islam; Mengya Xu; Yang Zhang; Hongliang Ren

SAM とロボット手術の融合: 堅牢性の観点からの実証的研究

Segment Anything Model (SAM) は、セマンティックセグメンテーションの基盤モデルであり、プロンプトで優れた一般化機能を示します。この実証研究では、ロボット手術の領域における SAM のロバスト性とゼロショットの一般化可能性を、(i) プロンプトと非プロンプトのさまざまな設定で調査します。 (ii) 境界ボックスとポイントベースのプロンプト。（iii）5つの重大度レベルの破損と摂動の下での一般化。 (iv) 最先端の教師ありモデルと SAM。 MICCAI EndoVis 2017 および 2018 チャレンジの 2 つの有名なロボット機器セグメンテーションデータセットを使用して、すべての観察を行います。私たちの広範な評価結果は、SAM がバウンディングボックスプロンプトで驚くべきゼロショット一般化機能を示しているものの、ポイントベースのプロンプトとプロンプトなしの設定で機器全体をセグメント化するのに苦労していることを明らかにしています。さらに、私たちの定性的な数値は、モデルが器具マスクの部分（顎、手首など）の予測に失敗したか、同じ境界ボックス内またはポイントで器具が重複するシナリオで異なるクラスとして器具の部分を予測できなかったことを示していますベースのプロンプト。実際、血液、反射、ぼやけ、影などの複雑な手術シナリオでは、器具を識別することができません。さらに、SAM は、さまざまな形式のデータ破損にさらされた場合に高いパフォーマンスを維持するには、堅牢性が不十分です。したがって、ドメイン固有の微調整を行わなければ、SAM はダウンストリームの外科的タスクの準備ができていないと主張できます。

Segment Anything Model (SAM) is a foundation model for semantic segmentation and shows excellent generalization capability with the prompts. In this empirical study, we investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery in various settings of (i) prompted vs. unprompted; (ii) bounding box vs. points-based prompt; (iii) generalization under corruptions and perturbations with five severity levels; and (iv) state-of-the-art supervised model vs. SAM. We conduct all the observations with two well-known robotic instrument segmentation datasets of MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict the parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as different classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, it is unable to identify instruments in some complex surgical scenarios of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. Therefore, we can argue that SAM is not ready for downstream surgical tasks without further domain-specific fine-tuning.

updated: Fri Apr 28 2023 08:06:33 GMT+0000 (UTC)

published: Fri Apr 28 2023 08:06:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト