Longer Version for

Jia-Hong Huang; Ting-Wei Wu; Chao-Han Huck Yang; Marcel Worring

「網膜画像キャプションのためのディープコンテキストエンコーディングネットワーク」の長いバージョン

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

網膜画像の医療レポートを自動生成することは、眼科医の作業負荷を軽減し、作業効率を向上させる有望な方法の 1 つです。この作業では、網膜画像の医療レポートを自動的に生成する新しいコンテキスト駆動型エンコーディングネットワークを提案します。提案されたモデルは、主にマルチモーダル入力エンコーダーと融合機能デコーダーで構成されます。私たちの実験結果は、提案された方法が入力画像とコンテキストの間のインタラクティブな情報、つまり私たちの場合はキーワードを効果的に活用できることを示しています。提案された方法は、ベースラインモデルよりも網膜画像のより正確で意味のあるレポートを作成し、最先端のパフォーマンスを実現します。このパフォーマンスは、医療レポート作成タスクで一般的に使用されるいくつかの指標に示されています。BLEU-avg (+16%)、CIDEr (+10.2%)、および ROUGE (+8.6%) です。

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency. In this work, we propose a new context-driven encoding network to automatically generate medical reports for retinal images. The proposed model is mainly composed of a multi-modal input encoder and a fused-feature decoder. Our experimental results show that our proposed method is capable of effectively leveraging the interactive information between the input image and context, i.e., keywords in our case. The proposed method creates more accurate and meaningful reports for retinal images than baseline models and achieves state-of-the-art performance. This performance is shown in several commonly used metrics for the medical report generation task: BLEU-avg (+16%), CIDEr (+10.2%), and ROUGE (+8.6%).

updated: Sun May 30 2021 13:37:03 GMT+0000 (UTC)

published: Sun May 30 2021 13:37:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト