You are on page 1of 6

A NOVEL KEY-POINT BASED METHOD FOR HIERARCHICAL YOGA POSE

CLASSIFICATION USING MACHINE LEARNING MODELS ON CUSTOM FEATURES

Aniket Verma, Hardik Garg

Indraprastha Institute of Information Technology, Delhi

ABSTRACT body joint detection algorithms. Authors of [10] make use of


keypoint detection to classify various yogic mudras or hand
Yoga is a traditional Indian practice for the holistic well-being
poses using Machine Learning algorithms. [11] makes use
of an individual. Recently, there have been many works which
of tf-pose estimation [12] and feeds raw keypoint data for 10
aimed to classify yoga poses from the images, however most
yoga poses and classifies them using Machine Learning mod-
of them use Deep Learning methods which are computation-
els. [13] used the Yoga-82 dataset which consists of 28.4k
ally expensive and time consuming. In this paper, we present
images of 82 yoga poses with three levels of hierarchy - first
a novel method to classify yoga poses in a hierarchical man-
level has 6 classes (type of pose such as sitting, standing etc)
ner by designing custom features based on body angles, dis-
which diverge into 20 sub-classes (legs in front or back etc)
tances and ratios. These same concepts are used by yoga in-
at the next level which further diverge into 82 classes (exact
structors to guide practitioners. Our method outperforms the
name of the pose), and performed a hierarchical classification
known state of the art works which use deep learning algo-
having 6, 20 and 82 classes from top to bottom levels. Further
rithms. We present numerical results for each hierarchical
[14] [15] [16] used the MediaPipe framework to get keypoints
level and compare our work to existing works.
from video frames. Joint angles were formed between each
Index Terms— computer vision, machine learning, pose of the keypoints and passed into ML models.
classification, keypoint detection, hierarchical classification However, there are certain limitations. First, most of the
works done involve very small dataset (1000-4000 images
1. INTRODUCTION with 6-10 poses) which puts the wider adaptability of the pro-
posed method in question. Second, although few works made
Yoga is an ancient Indian practice that has gained prominence use of keypoint data, none of them proposed designing cus-
among medical researchers due to its holistic focus on phys- tom features targeted towards yoga pose classification. Third,
ical, mental, and spiritual development. It is shown to pro- none of the works except [13] perform hierarchical classifica-
duce positive results in facilitating rehabilitation and some- tion on yoga poses which is important because yoga practi-
times even wholly curing diseases. With the advent of the tioners often practice poses of a particular super class as per
COVID-19 pandemic [1] [2] [3], many people resorted to yo- their requirement. Fourth, most works employ either Deep
gic practices for better health and immunity, much of which Learning techniques or Machine Learning techniques with-
was done online. In such a setup, it becomes essential to lever- out feature engineering both of which are time and resource
age AI systems to aid participants and instructors to ensure intensive.
that the participants perform the pose correctly because an in- Paper Contribution. In this work, our aim is to perform a
correct pose can potentially have detrimental effects on their three level hierarchical pose classification on Yoga-82 dataset
health. by leveraging Machine Learning techniques. To the best of
A plethora of approaches have been proposed with re- our knowledge, the only existing work in the domain of hier-
gards to the topic and one of them was [4] [5] which pro- archical yoga pose classification [13] makes use of modified
posed human pose estimation and classification of activities variants of DenseNet architecture. Further, no work in the do-
on the basis of extracted pose key points computed using the main of yoga pose classification designs custom features from
Open-Pose Framework [6]. 18 different key points have been keypoints. Our main contribution is designing novel custom
used to associate the 2D structure of the body and the prob- features from human pose keypoints using MediaPipe [16]
lem is formulated as a multi-classification problem. [7] made which simulate real life instructions given by yoga instructors
use of MediaPipe framework for yoga pose monitoring and [17] [18] based on angles, shapes and body ratio. This re-
pose prediction using Sequence Models. [8] uses a dataset duces the number of features from 132 to 25. We use these
of 1000 images comprising of five yoga poses and pass these features to present a comparison between three standard hi-
through the Blazepose keypoint detection framework [9]. erarchical classification algorithms using HiClass library [19]
Subsequently, classification is done on the basis of human and beat existing results for all three levels of classification.
Fig. 1. Pipeline for yoga pose classification

Table 1. Proposed custom features Table 2. Body part notations


Class Features Notation Notation Body Part
Max hand angle Hmax
Pn Nose
Min hand angle Hmin
Natural Angles Pls Left Shoulder
Max leg angle Lmax
Min leg angle Lmin Prs Right Shoulder
Max Elbow-Knee angle EKmax Ple Left Elbow
Min Elbow-Knee angle EKmin
Pre Right Elbow
Synthetic Angles Nose-Heel Angle NH
Elbow to Elbow Angle EE Plw Left Wrist
Knee to Knee Angle KK Prw Right Wrist
Right arm centroid HCr Pla Left Ankle
Left arm centroid HCl
Right leg centroid LCr Pra Right Ankle
Spatial Features Plk Left Knee
Left leg centroid LCl
Neck centroid NC Prk Right Knee
Abdomen centroid AC
Plh Left Hip
Prh Right Hip
2. PROPOSED METHOD

2.1. Preprocessing Local Classifier Per Node. Each node has a binary classifier
to predict whether the label for that node itself. In total, 108
Each image is passed through the MediaPipe framework
classifiers (6+20+82) are used.
which yields a set of 33 keypoints corresponding to human
Local Classifier Per Level. Each level has a multiclass clas-
body parts as shown in figure-2. Each keypoint further has
sifier associated to it. All the levels are mutually independent.
four features - x, y, z coordinates in 3D space and visibility
In total, 3 classifiers are used.
which is a value between 0 and 1 indicating how visible the
body part is. Therefore, our final dataset consists of 33*4 =
132 features representative of the images and the output label 2.3. Notations
indicating the pose name. A 1:3 stratified split is performed
to generate the test and train sets. We define meaningful notations for each keypoint and cus-
tom feature created. Table-2 represents notations for the
keypoints such that each symbol represents the (x, y, z) tuple
2.2. Hierarchical Classification
for the corresponding body part, like Psl for the left shoulder.
We perform hierarchical classification using three algorithms We also present some basic formulas to make subsequent
described below. expressions compact. Let ̸ (.) be a function that computes
−→ − → −→
Local Classifier Per Parent Node. Each parent node has a the three-dimensional angle between three vectors Pa , Pb , Pc
classifier which performs multiclass classification to classify where each of these is a three-dimensional vector where the
children. In total, 27 classifiers (1+6+20) are used. dimensions represent the coordinates of a keypoint in 3D
Hmin = min(̸ Psl Pel Pwl , ̸ Psr Per Pwr )) (3)
Features related to legs are used because a number of yoga
poses such as warrior pose and a sub-category of yoga called
Power Yoga are focused around thighs, knees and legs. This
category enables us to quantify the differences in legs align-
ment. To handle asymmetry, maximum and minimum angles
are taken.

Lmax = max(̸ Phl Pkl Pal , ̸ Phr Pkr Par )) (4)


Lmin = min(̸ Phl Pkl Pal , ̸ Phr Pkr Par )) (5)

2.5. Synthetic Angles


This class of features comprises of synthetic angles - which
are not directly formed by tracing blue lines in the [figure], yet
Fig. 2. MediaPipe keypoints obtained for a human pose are used in giving practical instructions to practitioners. First
we present maximum and minimum elbow-waist-knee angles
which are aimed at capturing the relative position between the
space as determined by MediaPipe. upper body and lower body. For many yoga poses, these have
to be in proper alignment whereas for some other poses they
have to be perpendicular or at miscellaneous angles. It also in-

→ − → −
→ − → −
→ − → dicates how much is the knee or the elbow stretching relative
−→ − → − → ϕ(Pa , Pb )2 + ϕ(Pb , Pc )2 − ϕ(Pa , Pc )2
̸(Pa , Pb , Pc ) = cos−1 −
→ −→ −→ −→ to the other body part. For every pose, there is a maximum
2.ϕ(Pa , Pb ).ϕ(Pa , Pc ) threshold of stretching beyond which it becomes unsafe and
(1)
may lead to injuries.
For simplicity, custom designed features in table-1 are
grouped into three categories which are described in subse- EKmax = max(̸ Psl Phl Pkl , ̸ Psr Phr Pkr )) (6)
quent sections.
EKmin = min(̸ Psl Phl Pkl , ̸ Psr Phr Pkr )) (7)
2.4. Natural Angles The angle subtended by the heels at the nose indicates the
strain put on the knees and calf muscles - power yoga poses
This class of features represent body angles which can be generally have a high value of this angle. The elbow-neck-
formed naturally - by tracing blue lines between keypoints as elbow and knee-neck-knee angles measure body flexibility for
shown in fig-2. This class consists of four features represent- symmetric poses as they should involve equal strain on left
ing the minimum and maximum angles between shoulder- and right sides of the body which cannot be captured asym-
elbow-wrist and waist-knee-foot. Features related to hands metric features. These features measure whether one side is
represent the alignment of hands and arms. A number of facing excessive strain or less to no strain.
yoga poses centred around the upper body involve a diverse
set of arm and hand movement (like one-legged pigeon pose) Psl + Psr r
N H = ̸ Pal ( )Pa (8)
and their relative positioning - in some poses elbows are bent 2
whereas in other poses wrists might be bent. Therefore we
can have a wide variety of combinations through each of the Psl + Psr l
EE = ̸ Per ( )Pe (9)
keypoints which this category aims to encompass. We present 2
two features for hands - minimum hand angle and maximum
hand angle - the motivation to take minimum and maximum Phr + Phl l
KK = ̸ Pkr ( )Pk (10)
angles is to handle asymmetry because many yoga poses are 2
asymmetric therefore, a simple left and right segregation does
not work. The minimum and maximum angles, on the con- 2.6. Spatial Features
trary, remain same irrespective of the change in alignment This class of features aims to capture spatial features in order
from left to write and vice versa. to anchor the angles to the appropriate regions correspond-
ing to the human body. The equations for each of the spatial
Hmax = max(̸ Psl Pel Pwl , ̸ Psr Per Pwr )) (2) features are given below.
Pwr + Per + Psr
HCr = (11)
3
P + Pel + Psl
l
HCl = w (12)
3
P + Pkr + Par
r
LCr = h (13)
3
P + Pkl + Pal
l
LCl = h (14)
3
P + Psl
r
LC = s (15)
2
P + Ps + Phr + Phl
r l
AC = s (16)
4
Fig. 4. MediaPipe keypoints obtained for a human pose
3. EXPERIMENT

Experiments were conducted using a wide variety of Machine


Learning models and keypoint detection frameworks. Four
types of input data were compared across experiments - raw
data containing keypoints, data containing only angular fea-
tures, data containing only spatial features, data containing
both angular and spatial features. Each of the input data was
generated using 3 keypoint detection frameworks - Medi-
apipe, OpenPose and PoseNet. For each of the input data
types, hierarchical classification was performed using a num-
ber of Machine Learning models. For each run, grid search
was applied on training dataset to find the best parameters and
cross validation accuracy is reported along with test accuracy.

4. RESULTS & CONCLUSION Fig. 5. MediaPipe keypoints obtained for a human pose

Through all the experiments, it is observed that MediaPipe All the numeric results are illustrated in table-3-5. Along with
framework yields the best results overall. Across all three lev- this, graphical plots for the best performing framework (Me-
els, SVM model gave the best results and the results for cus- diaPipe) are also presented in figures-3-5. Our study high-
tom features - Angles + Spatial coordinates are better than raw lights the potential of machine learning combined with early
keypoints data, and only angle and only spatial coordinates. fusion of deep learning frameworks in order to achieve state
of the art results in Yoga Pose classification. For future work,
the current framework can be extended to real-time pose clas-
sification and deployed in a practical setting, such as an online
yoga class.

Fig. 3. MediaPipe keypoints obtained for a human pose


Table 3. Experimental results from Yoga-82 Paper (accuracy)
Architecture # Params Accuracy
DenseNet Variant-1 18.27M 79.35
DensetNet Variant-2 18.27M 79.08
DenseNet Variant-3 22.59M 78.77
Custom Features + SVM (ours) - 86.59

Table 4. Results using Mediapipe Framework


Level Input Data Type Logistic Regression SGD Naive Bayes Random Forest KNN SVM
Raw Keypoints Data (mediapipe) 0.8067 0.4535 0.2743 0.6256 0.7569 0.8535
Raw Keypoints Data (openpose) 0.8631 0.5210 0.7755 0.8967 0.6746 0.8919
Raw Keypoints Data (posenet) 0.5024 0.1434 0.7942 0.4413 0.2906 0.4798
Level-3
Only Angles Data 0.5601 0.3846 0.4838 0.6367 0.6989 0.7126
Only Spatial Coordinates Data 0.5449 0.2524 0.3566 0.5082 0.6677 0.7871
Angles + Spatial Coordinates 0.8050 0.6509 0.6524 0.7177 0.8034 0.8659

Raw Keypoints Data (mediapipe) 0.7979 0.6991 0.3585 0.7380 0.8644 0.9111
Raw Keypoints Data (openpose) 0.8727 0.6110 0.5450 0.9207 0.7238 0.8895
Raw Keypoints Data (posenet) 0.2931 0.1762 0.1933 0.4786 0.3947 0.4249
Level-2
Only Angles Data 0.4744 0.4403 0.3846 0.6979 0.7688 0.7596
Only Spatial Coordinates Data 0.5386 0.4430 0.4677 0.6776 0.8147 0.8738
Angles + Spatial Coordinates 0.7347 0.6635 0.5755 0.7857 0.8849 0.9184

Raw Keypoints Data (mediapipe) 0.7979 0.6991 0.3585 0.7380 0.8644 0.9111
Raw Keypoints Data (openpose) 0.6842 0.6014 0.6338 0.9243 0.7551 0.7959
Raw Keypoints Data (posenet) 0.4569 0.4290 0.3085 0.5848 0.5413 0.5517
Level-1
Only Angles Data 0.5392 0.4851 0.4885 0.7403 0.7937 0.7898
Only Spatial Coordinates Data 0.6560 0.5874 0.5690 0.7552 0.8602 0.9101
Angles + Spatial Coordinates 0.7685 0.7314 0.6450 0.8287 0.9117 0.9383
5. REFERENCES [10] Abhishek Sharma, Yash Shah, Yash Agrawal, and Pra-
teek Jain, “Real-time recognition of yoga poses using
computer vision for smart health care,” 2022.
[1] Anil Patange and Punam Sawarkar, “Role of yoga for
the prevention &amp management of COVID-19-a re- [11] Yash Agrawal, Yash Shah, and Abhishek Sharma, “Im-
view,” International Journal of Research in Pharmaceu- plementation of machine learning technique for iden-
tical Sciences, vol. 11, no. SPL1, pp. 1720–1724, dec tification of yoga poses,” in 2020 IEEE 9th Interna-
2020. tional Conference on Communication Systems and Net-
work Technologies (CSNT), 2020, pp. 40–43.
[2] Chiranjivi Adhikari, Komal Shah, Somen Saha, and [12] Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xin-
Deepak Saxena, “Yoga, immunity and COVID-19: A long Wang, and Zhibin Wang, “Tfpose: Direct human
scoping review,” Journal of Family Medicine and Pri- pose estimation with transformers,” 2021.
mary Care, vol. 11, no. 5, pp. 1683, 2022.
[13] Manisha Verma, Sudhakar Kumawat, Yuta Nakashima,
and Shanmuganathan Raman, “Yoga-82: A new dataset
[3] HR Nagendra, “Yoga for COVID-19,” International for fine-grained classification of human poses,” Apr
Journal of Yoga, vol. 13, no. 2, pp. 87, 2020. 2020.
[14] Shikha Gupta Utkarsh Bahukhandi, “Yoga pose de-
[4] Kshama Gupta Kapil Gupta Abhay Gupta, tection and classification using machine learning tech-
Kuldeep Gupta, “Human activity recognition us- niques,” Dec 2021.
ing pose estimation and machine learning,” Feb [15] Daksh Goyal, Koteswar Rao Jerripothula, and Ankush
2021. Mittal, “Detection of gait abnormalities caused by neu-
rological disorders,” 2020 IEEE 22nd International
Workshop on Multimedia Signal Processing (MMSP),
[5] Ajay Chaudhari, Omkar Dalvi, Onkar Ramade, and 2020.
Dayanand Ambawade, “Yog-guru: Real-time yoga pose
correction system using deep learning methods,” in [16] Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris
2021 International Conference on Communication in- McClanahan, Esha Uboweja, Michael Hays, Fan Zhang,
formation and Computing Technology (ICCICT). June Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, and
2021, IEEE. et al., “Mediapipe: A framework for building perception
pipelines,” Jun 2019.

[6] Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, [17] “Yoga with adriene youtube channel,” .
and Yaser Sheikh, “Openpose: Realtime multi-person
[18] “Yoga journal — online yoga archives,” Mar 2022.
2d pose estimation using part affinity fields,” 2018.
[19] Fábio M. Miranda, Niklas Köhnecke, and Bernhard Y.
Renard, “Hiclass: a python library for local hierarchical
[7] Debabrata Swain, Santosh Satapathy, Biswaranjan classification compatible with scikit-learn,” 2021.
Acharya, Madhu Shukla, Vassilis C. Gerogiannis, An-
dreas Kanavos, and Dimitris Giakovis, “Deep learning
models for yoga pose monitoring,” Algorithms, vol. 15,
no. 11, 2022.

[8] Miral Desai and Hiren Mewada, “A novel approach for


yoga pose estimation based on in-depth analysis of hu-
man body joint detection accuracy,” PeerJ Computer
Science, vol. 9, pp. e1152, Jan. 2023.

[9] Valentin Bazarevsky, Ivan Grishchenko, Karthik


Raveendran, Tyler Zhu, Fan Zhang, and Matthias
Grundmann, “Blazepose: On-device real-time body
pose tracking,” 2020.

You might also like