You are on page 1of 30

A Thesis/Project/Dissertation Report

on
DETECTION OF BODY POSTURE FOR MUSCLE
REHABILITATION USING NEURAL NETWORK
TECHNIQUE
Submitted in partial fulfillment of the
requirement for the award of the degree of

Bachelor Of Technology In Computer Science


And Engineering

Under The Supervision of


Dr. E. Rajesh
Professor

Submitted By
Devansh, Pundir
19021011258/19SCSE1010049
Mohd. Sahil
19021011401/19SCSE1010212

SCHOOL OF COMPUTING SCIENCE AND ENGINEERING DEPARTMENT OF


COMPUTER SCIENCE AND ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA
INDIA
JANUARY, 2023
SCHOOL OF COMPUTING SCIENCE AND
ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA

CANDIDATE’S DECLARATION

We hereby certify that the work which is being presented in the project, entitled
“Detection Of Body Posture For Muscle Rehabilitation Using Neural Network
Technique” in partial fulfillment of the requirements for the award of the B.Tech
submitted in the School of Computing Science and Engineering of Galgotias
University, Greater Noida, is an original work carried out during the period of
January, 2023 to August, 2023, under the supervision of Dr. E. Rajesh Professor,
Department of Computer Science and Engineering, of School of Computing Science
and Engineering , Galgotias University, Greater Noida.

The matter presented in the project has not been submitted by us for the award of
any other degree of this or any other places.

Devansh Pundir, 19SCSE1010049


Mohd. Sahil, 19SCSE1010212

This is to certify that the above statement made by the candidates is correct to the
best of my knowledge.

Dr. E. Rajesh
Professor
CERTIFICATE

The Final Project Viva-Voce examination of Devansh. Pundir/ 19SCSE1010049 and


Mohd. Sahil/ 19SCSE1010212 has been held on _________________ and his/her
work is recommended for the award of B.Tech.

Signature of Examiner(s) Signature of Supervisor(s)

Signature of Project Coordinator Signature of Dean

Date:
Place: Greater Noida
Abstract

The complications of poor posture include back pain, spinal dysfunction, joint
degeneration, rounded shoulders and a potbelly. Suggestions to improve your posture
include regular exercise and stretching, ergonomic furniture and paying attention to
the way your body feels. Pose estimation refers to computer vision techniques that
detect human figures in images and video, so that one could determine, for example,
where someone’s elbow shows up in an image. To be clear, this technology is not
recognizing who is in an image — there is no personal identifiable information
associated with pose detection. The algorithm is simply estimating where key body
joints are. Human posture detection can be used for people who need muscle
rehabilitation training. It not only solves the problem of inconvenient operation for
patients with motor dysfunction, but also has important practical significance for
improving the rehabilitation training effect of patients. For building this project we
will be using Posenet - a machine learning model which allows for real-time human
pose estimation in the browser. Applications of this project can be used in the
real-world by organizations as snapchat filters where you see the tongue, aspects,
glimpse, dummy faces. Fitness apps which are used to detect your exercise poses and
virtual games to analyze shots of players.
Table Of Content

Title Page No.


Abstract
Contents
Chapter 1. Introduction
1.1 Introduction
1.2 Formulation of Problem
1.3 Tools and Technology Used

Chapter 2. Literature Survey/ Project Design


Chapter 3. Functionality/ Working of the Project
Chapter 4. Results and Discussion
Chapter 5. Conclusion and Future Scope
5.1 Conclusion
5.2 Future Scope

References
CHAPTER 1
Introduction

Human pose recognition is an extremely troublesome and difficult task within the
discipline of computer vision. It deals with the localization of human joints in a
picture or video to make a skeletal illustration. To mechanically discover a user's
activity in a picture may be a troublesome drawback because it depends on a variety
of aspects like scale and determination of the image, illumination variation,
background muddle, venture variations, and interaction of humans with the
environment. The matter with posture is that it's of utmost importance to apply it
properly as any incorrect posture is often unproductive and probably damaging. This
results in the requirement of getting an educator to supervise the session and proper
posture.

Since not all users have access or resources to an educator, an artificial


intelligence-based application can be wont to supply customized feedback to assist
people to improve their poses. In recent years, human pose estimation has benefited
greatly from machine learning and large gains in performance are achieved. Machine
learning approaches give an additional simple approach of mapping the structure
rather than having to wear down the dependencies between structures manually. This
project focuses on exploring the various approaches for pose classification and seeks
to realize insight into the following: what's cause estimation? What's Machine
learning? However, will machine learning be applied to pose detection in real-time?

Posenet is a deep learning TensorFlow model that allows you to estimate human pose
by detecting body parts such as elbows, hips, wrists, knees, ankles, and forming a
skeleton structure of your pose by joining these points.
Posenet is trained in MobileNet Architecture. MobileNet is a Convolutional neural
network developed by Google which is trained on the ImageNet dataset, majorly used
for Image classification in categories and target estimation. It is a lightweight model
which uses depthwise separable convolution to deepen the network and reduce
parameters, computation cost, and increased accuracy. There are tons of articles
related to MobileNet that you can find on google.

The pre-trained models run in our browsers, that is what differentiates posenet from
other API-dependent libraries. Hence, anyone with a limited configuration in a
laptop/desktop can easily make use of such models and build good projects.

This project uses references from conference proceedings, revealed papers, technical
reports, and journals. The second section talks about different cause extraction ways
that are then mentioned in conjunction with machine learning primarily based models
- PostNet, P5 and ML for posture detection.
CHAPTER 2
Literature Review

Pose estimation can be performed using the various deep learning based pose
estimation algorithms, they are mainly divided into two categories top-down and
bottom-up pose estimation algorithms or approaches.

The top-down approach of human pose estimation is a very naive and traditional
method. Given an image or a video of people it first detects where a person is present
in that image and then draws a bounding box around it using object detection. After
obtaining the bounding box it is fed to a pose estimator which then extracts the body
key-points from that bounding box. This approach is very simple but has some
drawbacks like the runtime is directly proportional to the number of people and high
computational cost.

The bottom-up approach is exactly opposite to that of the top-down approach yet so
powerful. It first draws the key-points on the image and then tries to map it with
different people in that image using part affinity maps. This method is not only fast
but also more accurate as compared to the top-down approach. All modern day pose
estimation algorithms are inspired by this approach. We have studied some major
bottom-up algorithms like DeepPose, Convolutional Pose Machines, Openpose,
Posenet and Blazepose.

We have compared the different bottom-up techniques for pose estimation on the basis
of a few parameters like type of architecture, baseline CNN model, average accuracy
and FPS achieved. The algorithms/techniques compared in this study are DeepPose,
Convolutional Pose Machines, Openpose and Posenet. All of these algorithms have a
multi-stage architecture which have multiple stages and can have different baseline
models

Another factor of comparison can be the fps achieved while testing the techniques and
it also depends on the hardware of the system and as we know that Posenet has
Mobilenet CNN and therefore it achieves the highest fps then comes DeepPose due to
its simpler architecture the fps achieved by CPM and Openpose were relatively less as
compared to the other two techniques since Openpose requires heavy computational
resources

After exploring all the techniques and methods we compared the pose estimation
techniques on a few parameters and our attempt will be to develop an application
which uses pose estimation and comparison which will help people who need muscle
rehabilitation training.
CHAPTER 3
Functionality/ Working of the Project

Functionality
Posenet can be used to estimate either a single pose or multiple poses, meaning there
is a version of the algorithm that can detect only one person in an image/video and
one version that can detect multiple persons in an image/video. Why are there two
versions? The single person pose detector is faster and simpler but requires only one
subject present in the image (more on that later). We cover the single-pose one first
because it’s easier to follow.

Posenet is a deep learning TensorFlow model that allows you to estimate human pose
by detecting body parts such as elbows, hips, wrists, knees, ankles, and forming a
skeleton structure of your pose by joining these points.

Posenet is trained in MobileNet Architecture. MobileNet is a Convolutional neural


network developed by Google which is trained on the ImageNet dataset, majorly used
for Image classification in categories and target estimation. It is a lightweight model
which uses depthwise separable convolution to deepen the network and reduce
parameters, computation cost, and increased accuracy. There are tons of articles
related to MobileNet that you can find on google.

The pre-trained models run in our browsers, that is what differentiates posenet from
other API-dependent libraries. Hence, anyone with a limited configuration in a
laptop/desktop can easily make use of such models and build good projects.
At a high level pose estimation happens in two phases:
1. An input RGB image is fed through a convolutional neural network.
2. Either a single-pose or multi-pose decoding algorithm is used to decode poses,
pose confidence scores, keypoint positions, and keypoint confidence scores
from the model outputs.

Let us learn about the most important terminologies


Pose - is at the highest level, Posenet will return a pose object that contains a list of
keypoints and an instance-level confidence score for each detected person.

Pose confidence score - this determines the overall confidence in the estimation of a
pose. It ranges between 0.0 and 1.0. It can be used to hide poses that are not deemed
strong enough.

Keypoint - a part of a person’s pose that is estimated, such as the nose, right ear, left
knee, right
foot, etc. It contains both a position and a keypoint confidence score. Posenet
currently detects 17 key points illustrated in the following diagram:

Keypoint Confidence Score - this determines the confidence that an estimated


keypoint position is accurate. It ranges between 0.0 and 1.0. It can be used to hide key
points that are not deemed strong enough.

Keypoint Position - 2D x and y coordinates in the original input image where a


keypoint has been detected.
DFD Diagram
Working of the Project
You can use any IDE to implement the project like Visual studio code, sublime text,
etc.

1) Boiler Template
Create a new folder and create one HTML file which will work as our website to
users. here only we will import our javascript file, Machine learning, and deep
learning libraries that we will use.

2) p5.js
It is a javascript library used for creative coding. There is one software known as
Processing on the top of which P5.js is based. The Processing was made in java,
which helps creative coding in desktop apps but after that when there was a need for
the same thing in websites then P5.js was implemented. Creative coding basically
means that It helps you to draw various shapes and figures like lines, rectangles,
squares, circles, points, etc on the browser in a creative manner(colored or animated)
by just calling an inbuilt function, and provide height and width of shape you want.

Create one javascript file, and here we will try to learn P5.JS, and why we are using
this library. Before writing anything in the javascript file first import P5.js, add a link
to a created javascript file in the HTML file.

There are basic 2 things in P5.js that you implement. write the below code in
the javascript file.
a) setup – In this function, you write a code that is related to the basic configuration
you need in your interface. One thing you create is canvas and specify its size here.
And all the things you implement will appear in this canvas only. Its work is to set up
all the things.

function setup() { // this function runs only once while running


createCanvas(800, 500);
}

b) draw – The second function is to draw where you draw all things you want like
shapes, place images, play video. all the implementation code placed in this function.
Understand it as a main function in compiled languages. Its work is to display things
on the screen.

The best thing is for each figure there is an inbuilt function, and you only need to call
and pass some coordinates to draw a shape. to give background color to canvas, call
background function and pass color code.

Functions to add geometry on skeleton


Adding Point - to draw a simple point use point function and pass x and y
coordinates.
point(200, 200)

Adding Line - Line is something which connects two points to only you have to call
line function and pass coordinates of 2 points means 4 coordinates.
line(200, 200, 300, 300)
Adding Triangle
triangle(100, 200, 300, 400, 150, 450)

Adding Rectangle - It calls the rect function and passes height and width. If height
and width are the same then it will be square.

rect(300, 200, 100, 100)

Adding Circle
ellipse(600, 300, 10, 10)

Adding Stroke(R, G, B, Opacity) - It defines the outer boundary line of shape.


stroke(255, 0, 0)

Adding Stroke Weight - It defines how much width the outer line should be.
strokeWeight(5)

Adding Fill(R, G, B, Opacity) - It defines the color you want to fill in the shape.
fill(132, 100, 34)

Draws Ellipse On Mouse Movement


ellipse(mouseX, mouseY, 50, 50)

Loads Image/ Under SETUP Function


let img
img = loadImage('\images\glasses.png')
Add Properties To The Image/ Under DRAW Function
image(img, 100, 100, 100, 100)

Loads Camera Video/ Under SETUP Function


capture = createCapture(VIDEO)
capture.hide()

Add Properties To The Camera Video/ Under DRAW Function


image(capture, 0, 0, 800, 600)

3) ML5.js
The best way to share code applications with others is the web. Only share URL and
you can use other applications on your system. keeping this google implemented
tensorflow.js, but working with tensorflow.js requires a deep understanding So,
ML5.js build a wrapper around tensorflow.js and made the task simple by using some
function so indirectly you will deal with TensorFlow.js through ml5.js. The same you
can read on official documentation of Ml5.js

Hence, It is the main library that consists of various deep learning models on which
you can build projects. In this project, we are using the Posenet model which is also
present in this library.

Let's import the library, and use it. In the HTML file paste the below script code to
load the library.

Now let’s set up the Image capture and load the Posenet model. The capture variable
is a global variable, and all the variables we will be creating have global scope.
let capture;
function setup() { // this function runs only once while running
createCanvas(800, 500);
//console.log("setup function");
capture = createCapture(VIDEO);
capture.hide();
//load the Posenet model
posenet = ml5.poseNet(capture, modelLOADED);
//detect pose
posenet.on('pose', recievedPoses);
}
function recievedPoses(poses) {
console.log(poses);
if(poses.length > 0) {
singlePose = poses[0].pose;
skeleton = poses[0].skeleton;
}
}

As we load and run the code, Posenet will detect 17 body points(5 facial points, 12
body points) along with information that at what pixel the point is detected in an
Image. And if you print these poses then it will return an array(python list) that
consists of a dictionary with 2 keys as pose and skeleton that we have assessed.

1. pose – It is again a dictionary that consists of various keys and a list of values
as key points, left eye, left ear, nose, etc.
2. skeleton – In skeleton, each dictionary consists of two subdictionaries as zero
and one that has a confidence score, part name, and position coordinate. so we
can use this to make a line and construct a skeleton structure.

We have a keypoints name dictionary which has the X and y coordinate of each point.
so we can traverse in the keypoints dictionary and access position dictionary in that
and use x and y coordinate in that.

Now to draw the line we can use the second dictionary as a skeleton that consists of
all points of information or coordinates to connect two body parts.

function draw() {
// images and video(webcam)
image(capture, 0, 0);
fill(255, 0, 0);
if(singlePose) { // if someone is captured then only
// Capture all estimated points and draw a circle of 20 radius
for(let i=0; i<singlePose.keypoints.length; i++) {
ellipse(singlePose.keypoints[i].position.x, singlePose.keypoints[i].position.y,
20);
}
stroke(255, 255, 255);
strokeWeight(5);
// construct skeleton structure by joining 2 parts with line
for(let j=0; j<skeleton.length; j++) {
line(skeleton[j][0].position.x, skeleton[j][0].position.y,
skeleton[j][1].position.x, skeleton[j][1].position.y);
}
}
Deploy the Project
As the project is on a browser so you can simply deploy it on Github and make it
available for others to use. Just upload all the files and images to the new repository
on Github as they are in your local system. After uploading, visit the settings of the
repository and visit Github pages. change none to the main branch and click save. It
will give you the URL of a project which will live after some time and you can share
it with others.

Source Code
let capture;
let posenet = null;
let singlePose;
let skeleton;

function setup() {
createCanvas(800, 600)
capture = createCapture(VIDEO)
capture.hide()

posenet = ml5.poseNet(capture, modelLoaded)

posenet.on("pose", receivedPoses)
}

function receivedPoses(poses) {
console.log(poses)
if (poses.length > 0) {
singlePose = poses[0].pose;
skeleton = poses[0].skeleton;
}
}

function modelLoaded() {
console.log("Model Has Loaded")
}
function draw() {
image(capture, 0, 0, 800, 600)

// Adding Fill To Keypoints


fill(255, 0, 0)

// Adding Geometry To Keypoints


if (singlePose) {
for (let i = 0; i < singlePose.keypoints.length; i++) {
ellipse(singlePose.keypoints[i].position.x, singlePose.keypoints[i].position.y,
50, 50)
}
// Adding Stroke To Skeleton
stroke(255, 255, 255)
// Adding Stroke Weight To Skeleton
strokeWeight(5)
for (let j = 0; j < skeleton.length; j++) {
line(skeleton[j][0].position.x, skeleton[j][0].position.y,
skeleton[j][1].position.x, skeleton[j][1].position.y)
}
}

Let’s review the inputs for the single-pose estimation algorithm:


1. Input image element - An html element that contains an image to predict
poses for, such as a video or image tag. Importantly, the image or video element
fed in should be square.

2. Image scale factor — A number between 0.2 and 1. Defaults to 0.50. What to
scale the image by before feeding it through the network. Set this number lower
to scale down the image and increase the speed when feeding through the
network at the cost of accuracy.

3. Flip horizontal — Defaults to false. If the poses should be flipped/mirrored


horizontally. This should be set to true for videos where the video is by default
flipped horizontally (i.e. a webcam), and you want the poses to be returned in
the proper orientation.

4. Output stride — Must be 32, 16, or 8. Defaults to 16. Internally, this


parameter affects the height and width of the layers in the neural network. At a
high level, it affects the accuracy and speed of the pose estimation. The lower
the value of the output stride the higher the accuracy but slower the speed, the
higher the value the faster the speed but lower the accuracy. The best way to
see the effect of the output stride on output quality is to play with the
single-pose estimation demo.
Now let’s review the outputs for the single-pose estimation algorithm:
1. A pose, containing both a pose confidence score and an array of 17 key points.
2. Each keypoint contains a keypoint position and a keypoint confidence score.
Again, all the keypoint positions have x and y coordinates in the input image
space, and can be mapped directly onto the image.
CHAPTER 4
Results

An example output pose and skeleton looks like the following:

Output Data on the Console


Here is the Poses Array of the 17 Body Key Points
Here are the Skeleton array connecting 17 Poses
Web App
CHAPTER 5
Conclusion And Future Scope

Human posture detection can be used for people who need muscle rehabilitation
training. It not only solves the problem of inconvenient operation for patients with
motor dysfunction, but also has important practical significance for improving the
rehabilitation training effect of patients.

Applications of Pose Detection in the Real-world can be used by organizations


in the following ways -
1) Used in Snapchat filters where you see the tongue, aspects, glimpse, dummy faces.
2) Fitness apps which use human posture detect for fixing your exercise poses.
3) A very popular Instagram Reels uses posture detection to provide you different
features to apply on your face and surroundings.
4) Virtual Games to analyze shots of players.
References

1. Derrick Mwiti, "A 2019 Guide to Human Pose Estimation," August 5,2019.
https://heartbeat.comet.ml/a-2019-guide-to-humanpose-estimation-c10b79b64b73

2. A. Toshev and C. Szegedy, "DeepPose: Human Pose Estimation via Deep Neural
Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014,
pp. 1653-1660, doi: 10.1109/CVPR.2014.214.

You might also like