Automated Detection of Patients in Hospital Video Recordings

Siddharth Sharma; Florian Dubost; Christopher Lee-Messer; Daniel Rubin

病院のビデオ録画における患者の自動検出

臨床現場では、てんかん患者はビデオ脳波（EEG）テストを介して監視されます。ビデオEEGは患者がビデオテープで体験したことを記録し、EEGデバイスは患者の脳波を記録します。現在、発作中に患者の位置を追跡するための既存の自動化された方法はなく、入院患者のビデオ録画は、公開されているビデオベンチマークデータセットとは大幅に異なります。たとえば、カメラアングルが異常な場合があり、患者は寝具シーツや電極セットで部分的に覆われている可能性があります。ビデオEEGを使用して患者をリアルタイムで追跡できることは、医療の質の向上に向けた有望なイノベーションとなるでしょう。具体的には、自動化された患者検出システムは、臨床的監視を補完し、患者を継続的に監視する必要がある看護師や医師のリソースを大量に消費する労力を削減することができます。入院患者の45のビデオの独自のキュレートされたデータセットを使用して、患者検出のタスクで、オブジェクト検出の標準的な深層学習モデルであるImageNet事前トレーニング済みマスクR-CNNを評価します。データセットは、この作業のために集約およびキュレートされました。微調整を行わないと、ImageNetの事前トレーニング済みマスクR-CNNモデルがそのようなデータに対して不十分に機能することを示します。データセットのサブセットを使用してモデルを微調整することにより、平均平均精度0.64で、患者の検出パフォーマンスが大幅に向上することがわかります。結果はビデオクリップによって大幅に異なることを示しています。

In a clinical setting, epilepsy patients are monitored via video electroencephalogram (EEG) tests. A video EEG records what the patient experiences on videotape while an EEG device records their brainwaves. Currently, there are no existing automated methods for tracking the patient's location during a seizure, and video recordings of hospital patients are substantially different from publicly available video benchmark datasets. For example, the camera angle can be unusual, and patients can be partially covered with bedding sheets and electrode sets. Being able to track a patient in real-time with video EEG would be a promising innovation towards improving the quality of healthcare. Specifically, an automated patient detection system could supplement clinical oversight and reduce the resource-intensive efforts of nurses and doctors who need to continuously monitor patients. We evaluate an ImageNet pre-trained Mask R-CNN, a standard deep learning model for object detection, on the task of patient detection using our own curated dataset of 45 videos of hospital patients. The dataset was aggregated and curated for this work. We show that without fine-tuning, ImageNet pre-trained Mask R-CNN models perform poorly on such data. By fine-tuning the models with a subset of our dataset, we observe a substantial improvement in patient detection performance, with a mean average precision of 0.64. We show that the results vary substantially depending on the video clip.

updated: Sun Nov 28 2021 23:15:06 GMT+0000 (UTC)

published: Sun Nov 28 2021 23:15:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト