Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition

Fangqiu Yi; Tingting Jiang

エンドツーエンドではない：オンラインの外科的位相認識のためのマルチステージアーキテクチャを探る

手術段階の認識は、手術ビデオの各フレームでどの段階が発生しているかを予測することを目的とするコンピュータ支援手術システムにとって特に重要です。マルチステージアーキテクチャを備えたネットワークは、豊富なパターンを持つ多くのコンピュータビジョンタスクに広く適用されています。予測ステージは最初に初期予測を出力し、追加の改良ステージは初期予測を操作してさらに改良を実行します。既存の作品は、手術ビデオのコンテンツが整然としていて、豊富な時間的パターンを含んでいることを示しており、多段階アーキテクチャが手術フェーズ認識タスクに非常に適しています。ただし、多段階アーキテクチャを手術段階認識タスクに単純に適用する場合、エンドツーエンドのトレーニング方法では、洗練能力がその期待を下回ってしまうことがわかります。この問題に対処するために、新しい非エンドツーエンドのトレーニング戦略を提案し、外科的位相認識タスクのための多段階アーキテクチャのさまざまな設計を検討します。非エンドツーエンドのトレーニング戦略の場合、改良段階は、提案された2種類の妨害シーケンスを使用して個別にトレーニングされます。一方、改良モデルの3つの異なる選択肢を評価して、分析とソリューションが特定の多段階モデルの選択に対してロバストであることを示します。 M2CAI16ワークフローチャレンジとCholec80データセットの2つの公開ベンチマークで実験を行います。結果は、私たちの戦略でトレーニングされたマルチステージアーキテクチャが、現在の最先端のシングルステージモデルのパフォーマンスを大幅に向上させることを示しています。コードはhttps://github.com/ChinaYi/casual_tcnで入手できます。

Surgical phase recognition is of particular interest to computer assisted surgery systems, in which the goal is to predict what phase is occurring at each frame for a surgery video. Networks with multi-stage architecture have been widely applied in many computer vision tasks with rich patterns, where a predictor stage first outputs initial predictions and an additional refinement stage operates on the initial predictions to perform further refinement. Existing works show that surgical video contents are well ordered and contain rich temporal patterns, making the multi-stage architecture well suited for the surgical phase recognition task. However, we observe that when simply applying the multi-stage architecture to the surgical phase recognition task, the end-to-end training manner will make the refinement ability fall short of its wishes. To address the problem, we propose a new non end-to-end training strategy and explore different designs of multi-stage architecture for surgical phase recognition task. For the non end-to-end training strategy, the refinement stage is trained separately with proposed two types of disturbed sequences. Meanwhile, we evaluate three different choices of refinement models to show that our analysis and solution are robust to the choices of specific multi-stage models. We conduct experiments on two public benchmarks, the M2CAI16 Workflow Challenge, and the Cholec80 dataset. Results show that multi-stage architecture trained with our strategy largely boosts the performance of the current state-of-the-art single-stage model. Code is available at https://github.com/ChinaYi/casual_tcn.

updated: Sat Jul 10 2021 11:00:38 GMT+0000 (UTC)

published: Sat Jul 10 2021 11:00:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト