Surgical Instruction Generation with Transformers

Jinglu Zhang; Yinyu Nie; Jian Chang; Jian Jun Zhang

トランスフォーマーによる外科的命令の生成

自動外科的指示の生成は、術中のコンテキストアウェアな外科的支援に向けた前提条件です。ただし、現在のビューの手術活動を共同で理解し、視覚情報とテキストによる説明の関係をモデル化する必要があるため、手術シーンから指示を生成することは困難です。オープンドメインでのニューラル機械翻訳と画像キャプションタスクに触発されて、外科的画像から命令を生成するための自己臨界強化学習を備えた変圧器バックボーンエンコーダ-デコーダネットワークを紹介します。さまざまな医療分野からの290の手順を含むDAISIデータセットに対する私たちの方法の有効性を評価します。私たちのアプローチは、すべてのキャプション評価指標で既存のベースラインを上回っています。結果は、マルチモーダルコンテキストの処理におけるトランスフォーマーによってバックボーンされたエンコーダー-デコーダー構造の利点を示しています。

Automatic surgical instruction generation is a prerequisite towards intra-operative context-aware surgical assistance. However, generating instructions from surgical scenes is challenging, as it requires jointly understanding the surgical activity of current view and modelling relationships between visual information and textual description. Inspired by the neural machine translation and imaging captioning tasks in open domain, we introduce a transformer-backboned encoder-decoder network with self-critical reinforcement learning to generate instructions from surgical images. We evaluate the effectiveness of our method on DAISI dataset, which includes 290 procedures from various medical disciplines. Our approach outperforms the existing baseline over all caption evaluation metrics. The results demonstrate the benefits of the encoder-decoder structure backboned by transformer in handling multimodal context.

updated: Fri Jul 16 2021 19:56:59 GMT+0000 (UTC)

published: Wed Jul 14 2021 19:54:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト