An Efficient Deep Convolutional Neural Network Model For Yoga Pose Recognition Using Single Images

Santosh Kumar Yadav; Apurv Shukla; Kamlesh Tiwari; Hari Mohan Pandey; Shaik Ali Akbar

単一画像を使用したヨガのポーズ認識のための効率的なディープ畳み込みニューラルネットワークモデル

姿勢認識は、2D/3D 空間内で人体の関節の位置を特定し、姿勢を予測するために推定された関節の位置に基づいて推論を実行するアルゴリズムの設計を扱います。ヨガのポーズはいくつかの非常に複雑なポーズで構成されています。これは、オクルージョン、クラス間の類似性、クラス内の変動性、視点の複雑さなど、コンピュータービジョンアルゴリズムにさまざまな課題を課します。この論文では、RGB 画像からヨガのアーサナを認識するための効率的なディープ畳み込みニューラルネットワーク (CNN) モデルである YPose を紹介します。提案されたモデルは、次の 4 つのステップで構成されます。(a) まず、セグメンテーションベースのアプローチを使用して関心領域 (ROI) をセグメント化し、元の画像から ROI を抽出します。 (b) 次に、これらの洗練された画像は、特徴抽出のために EfficientNets のバックボーンに基づく CNN アーキテクチャに渡されます。 (c) 3 番目に、より多様な機能を学習するために、高密度に接続されたネットワークのアーキテクチャから適応された高密度リファインメントブロックが追加されます。 (d) 4 番目に、全体的な平均プーリングと完全に接続されたレイヤーが、ヨガのポーズのマルチレベル階層の分類に適用されます。提案されたモデルは、Yoga-82 データセットでテストされました。これは、ヨガのポーズ認識用の公開されているベンチマークデータセットです。実験結果は、提案されたモデルがこのデータセットで最先端のものを達成していることを示しています。提案されたモデルでは 93.28% の精度が得られました。これは、以前の最先端モデル (79.35%) よりも改善されており、マージンは約 13.9% です。コードは公開されます。

Pose recognition deals with designing algorithms to locate human body joints in a 2D/3D space and run inference on the estimated joint locations for predicting the poses. Yoga poses consist of some very complex postures. It imposes various challenges on the computer vision algorithms like occlusion, inter-class similarity, intra-class variability, viewpoint complexity, etc. This paper presents YPose, an efficient deep convolutional neural network (CNN) model to recognize yoga asanas from RGB images. The proposed model consists of four steps as follows: (a) first, the region of interest (ROI) is segmented using segmentation based approaches to extract the ROI from the original images; (b) second, these refined images are passed to a CNN architecture based on the backbone of EfficientNets for feature extraction; (c) third, dense refinement blocks, adapted from the architecture of densely connected networks are added to learn more diversified features; and (d) fourth, global average pooling and fully connected layers are applied for the classification of the multi-level hierarchy of the yoga poses. The proposed model has been tested on the Yoga-82 dataset. It is a publicly available benchmark dataset for yoga pose recognition. Experimental results show that the proposed model achieves the state-of-the-art on this dataset. The proposed model obtained an accuracy of 93.28%, which is an improvement over the earlier state-of-the-art (79.35%) with a margin of approximately 13.9%. The code will be made publicly available.

updated: Tue Jun 27 2023 19:34:46 GMT+0000 (UTC)

published: Tue Jun 27 2023 19:34:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト