Skin feature point tracking using deep feature encodings

Jose Ramon Chang; Torbjörn E. M. Nordling

深い特徴エンコーディングを使用した肌の特徴点の追跡

顔の特徴の追跡は、心拍数を適切に推定するために顔のキーポイントの変位を正確に定量化する必要があるイメージング心電図（BCG）の重要なコンポーネントです。皮膚の特徴の追跡により、パーキンソン病の運動機能低下のビデオベースの定量化が可能になります。従来のコンピュータービジョンアルゴリズムには、スケール不変特徴変換（SIFT）、高速化されたロバスト機能（SURF）、およびLucas-Kanade法（LK）が含まれます。これらは長い間、効率と精度の点で最先端を代表してきましたが、アフィン局所変換や照明の変化などの一般的な変形が存在する場合は失敗します。過去5年間で、深い畳み込みニューラルネットワークは、ほとんどのコンピュータービジョンタスクで従来の方法を上回りました。特徴追跡用のパイプラインを提案します。これは、畳み込みスタックオートエンコーダーを適用して、対象の特徴を含む参照作物に画像内で最も類似した作物を識別します。オートエンコーダは、トレーニングされたオブジェクトカテゴリに固有の深い特徴エンコーディングに画像の切り抜きを表現することを学習します。顔画像でオートエンコーダーをトレーニングし、手動でラベル付けされた顔と手のビデオを使用して、一般的に肌の特徴を追跡する能力を検証します。特徴的な皮膚の特徴（モル）の追跡誤差は非常に小さいため、χ^ 2検定に基づく手動のラベル付けに起因することを排除することはできません。平均誤差は0.6〜4.2ピクセルで、1つのシナリオを除くすべてのシナリオで、この方法が他の方法を上回りました。さらに重要なことに、発散しないのは私たちの方法だけでした。私たちの方法は、従来のアルゴリズムよりも、特徴追跡、特徴マッチング、および画像レジストレーションのための優れた特徴記述子を作成すると結論付けています。

Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Lucas-Kanade method (LK). These have long represented the state-of-the-art in efficiency and accuracy but fail when common deformations, like affine local transformations or illumination changes, are present. Over the past five years, deep convolutional neural networks have outperformed traditional methods for most computer vision tasks. We propose a pipeline for feature tracking, that applies a convolutional stacked autoencoder to identify the most similar crop in an image to a reference crop containing the feature of interest. The autoencoder learns to represent image crops into deep feature encodings specific to the object category it is trained on. We train the autoencoder on facial images and validate its ability to track skin features in general using manually labeled face and hand videos. The tracking errors of distinctive skin features (moles) are so small that we cannot exclude that they stem from the manual labelling based on a χ^2-test. With a mean error of 0.6-4.2 pixels, our method outperformed the other methods in all but one scenario. More importantly, our method was the only one to not diverge. We conclude that our method creates better feature descriptors for feature tracking, feature matching, and image registration than the traditional algorithms.

updated: Sun Dec 04 2022 12:03:47 GMT+0000 (UTC)

published: Tue Dec 28 2021 14:29:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト