Automated Temporal Segmentation of Orofacial Assessment Videos

Saeid Alavi Naeini; Leif Simmatis; Deniz Jafari; Diego L. Guarin; Yana Yunusova; Babak Taati

口腔顔面評価ビデオの自動時間セグメンテーション

コンピュータービジョン技術は、口腔顔面障害の臨床検査を自動化または部分的に自動化して、正確で客観的な評価を提供するのに役立ちます。このような自動化されたシステムの開発に向けて、口腔顔面評価ビデオの繰り返しを検出し、一時的にセグメント化 (解析) する 2 つのアプローチを評価しました。筋萎縮性側索硬化症 (ALS) の参加者と健常者 (HC) の参加者の記録されたビデオは、Toronto NeuroFace データセットから取得されました。繰り返しの検出と解析のための 2 つのアプローチが検討されました。1 つは、追跡された顔のランドマークからの設計された特徴と、上唇と下唇の朱肉と皮膚の接合部間の距離におけるピーク検出 (ベースライン分析) に基づいており、もう 1 つは事前にトレーニングされたトランスフォーマーを使用しています。 RepNet (Dwibedi et al, 2020) と呼ばれるベースのディープラーニングモデルは、周期性を自動的に検出し、ビデオデータの周期的および半周期的な繰り返しを解析します。 2 つの口腔顔面評価タスクの実験的評価では、最大の口の開口部の繰り返し (OPEN) と文「Bobby a Puppy を購入」 (BBP) の繰り返しを繰り返しました。グラウンドトゥルースの手動解析に関するユニオン (IoU)。 RepNet を使用した自動解析では、BBP 繰り返しの期間に基づいて HC と ALS 参加者を明確に分離しましたが、ランドマークベースの方法では分離できませんでした。

Computer vision techniques can help automate or partially automate clinical examination of orofacial impairments to provide accurate and objective assessments. Towards the development of such automated systems, we evaluated two approaches to detect and temporally segment (parse) repetitions in orofacial assessment videos. Recorded videos of participants with amyotrophic lateral sclerosis (ALS) and healthy control (HC) individuals were obtained from the Toronto NeuroFace Dataset. Two approaches for repetition detection and parsing were examined: one based on engineered features from tracked facial landmarks and peak detection in the distance between the vermilion-cutaneous junction of the upper and lower lips (baseline analysis), and another using a pre-trained transformer-based deep learning model called RepNet (Dwibedi et al, 2020), which automatically detects periodicity, and parses periodic and semi-periodic repetitions in video data. In experimental evaluation of two orofacial assessments tasks, - repeating maximum mouth opening (OPEN) and repeating the sentence "Buy Bobby a Puppy" (BBP) - RepNet provided better parsing than the landmark-based approach, quantified by higher mean intersection-over-union (IoU) with respect to ground truth manual parsing. Automated parsing using RepNet also clearly separated HC and ALS participants based on the duration of BBP repetitions, whereas the landmark-based method could not.

updated: Mon Aug 22 2022 20:53:25 GMT+0000 (UTC)

published: Mon Aug 22 2022 20:53:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト