Future-State Predicting LSTM for Early Surgery Type Recognition

Siddharth Kannan; Gaurav Yengera; Didier Mutter; Jacques Marescaux; Nicolas Padoy

早期手術タイプ認識のための将来状態予測LSTM

この作品は、そのビデオから腹腔鏡手術の種類の早期認識のための新しいアプローチを提示します。初期の認識アルゴリズムは、自動コンテキスト認識支援を提供し、迅速なデータベースインデックス作成を可能にする「スマート」ORシステムの開発に有益です。ただし、手術には、手術全体の視覚的類似性の高さやビデオの長さの大きなばらつきなど、腹腔鏡検査の領域に属するビデオに固有の課題が伴います。これらのビデオの時空間依存性をキャプチャするために、モデルとして畳み込みニューラルネットワーク（CNN）とLong Short-Term Memory（LSTM）ネットワークの組み合わせを選択します。次に、早期認識パフォーマンスを改善するための2つの補完的なアプローチを提案します。最初のアプローチは、腹腔鏡ビデオの最初のフレームに基づいて手術を区別することを奨励するCNN微調整方法です。「Future-State Predicting LSTM」と呼ばれる2番目のアプローチは、LSTMをトレーニングして、将来のフレームに関連する情報を予測します。これは、異なる種類の手術を区別するのに役立ちます。 9種類の手術（Laparo425）を含む425の腹腔鏡ビデオの大規模なデータセットでアプローチを評価し、手術の最初の10分間のみを観察して平均75％の精度を達成します。これらの結果は、実用的な観点から非常に有望であり、他のタイプの画像誘導手術にも有望です。

This work presents a novel approach for the early recognition of the type of a laparoscopic surgery from its video. Early recognition algorithms can be beneficial to the development of 'smart' OR systems that can provide automatic context-aware assistance, and also enable quick database indexing. The task is however ridden with challenges specific to videos belonging to the domain of laparoscopy, such as high visual similarity across surgeries and large variations in video durations. To capture the spatio-temporal dependencies in these videos, we choose as our model a combination of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network. We then propose two complementary approaches for improving early recognition performance. The first approach is a CNN fine-tuning method that encourages surgeries to be distinguished based on the initial frames of laparoscopic videos. The second approach, referred to as 'Future-State Predicting LSTM', trains an LSTM to predict information related to future frames, which helps in distinguishing between the different types of surgeries. We evaluate our approaches on a large dataset of 425 laparoscopic videos containing 9 types of surgeries (Laparo425), and achieve on average an accuracy of 75% having observed only the first 10 minutes of a surgery. These results are quite promising from a practical standpoint and also encouraging for other types of image-guided surgeries.

updated: Thu Sep 05 2019 03:00:53 GMT+0000 (UTC)

published: Wed Nov 28 2018 18:26:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト