Professional Documents
Culture Documents
Anubhav Singh1, Shruti Agarwal2, Preeti Nagrath3, Anmol Saxena4, Narina Thakur5
1,3,4,5
Computer Science Department, Bharati Vidyapeeth’s College of Engineering, New Delhi, India
2
Electronics and Communication Department, Bharati Vidyapeeth’s College of Engineering, New Delhi, India
1
anubhav.singh2359@gmail.com, 2agarwalshruti2801@gmail.com
3
preeti.nagrath@bharatividyapeeth.edu, 4anmsaxena123@gmail.com, 5narinat@gmail.com
Abstract: Human pose estimation has always been a as firstly there's no compelling reason to design feature
challenging problem that holds great attention, it has the representations and identifiers for parts because the model and
widespread and extensive variety of uses from the its attributes are been demonstrated from the data itself.
classification of images to activity acknowledgment, main Secondly, the model that has been learned is comprehensive,
challenge is the detection and localization of the key points in where the last joint prediction is based on a complex non-
the variation of several body poses. To resolve this issue, linear change of the entire image, as restricted to local
substantial research work have been done in this area. This identifiers whose thinking is a solitary part and can just model
paper discusses the issues in human pose estimation and a little subset of between activities and between body parts.
gives the overview of considerable research work in pose There are some difficulties in the prediction of human pose
estimation, including deep learning approach and customary coordinates: the foreshortening of appendages, impediment of
image-based techniques. After analyzing several results and appendages, turn and introduction of the figure, and cover of
detecting the restrictions, the author has reconstructed a different subjects. For example, there are various cumbersome
simple model using convolutional neural network that poses to annotate: rotation, foreshortening, occlusion, and
estimates the poses and demonstrates the potential of CNN's. various figures this fluctuation in the info frame proposes that
The author concludes with a few promising bearings and the entire thinking given by CNN might be a great system. In
directions that have to be explored for future research. this venture, researchers investigate diverse CNN models for
demonstrating human posture estimation and activity
I. INTRODUCTION classification[2].
Human pose estimation holds extraordinary potential from
Pose estimation has ample uses and has numerous utilizations,
single, 2D pictures to aid an extensive variety of uses from the
response to the body to augmented reality to situation related
classification of images and recordings, activity
support, liveliness, wellness uses, and that's just the beginning.
acknowledgment, active investigation, and grabbed great
The authors in this paper believe the availability of this model
attention in computer vision and human PC interaction.
motivates more designers and producers to try and apply
However, human posture estimation has always been a
present recognition to their very own unique projects. While
challenging problem that acquires great attention. It involves
many substitute posture location frameworks have been
huge difficulty for the identification and localization of key
publicly released, all require specific equipment or potentially
points of the body that mainly includes various joints and body
cameras, and also a lot of framework setup. this human pose
movement forecast and also shares difficulties in detection, for
estimation can be utilized to gauge either a single pose or
example in clustering, lighting, perspective, and scale, are the
multiple poses, which means there is a form of the algorithm
significant troubles interesting to human postures[1].
that can recognize just a single individual in a picture and one
The detection of body key points has been a great problem due form that can identify numerous people in a picture.
to little joints, impediments, and the need to catch content.
Therefore for the advancement and solution for this
Hence convolutional neural networks have a remarkable
challenging problem MPII Human Pose dataset has been used
approach to image classification and object identification
for analyzing human postures. It involves around 25k images
issues. Fundamentally they are similar to conventional neural
of 40k people having various illustrated body joints .Every
systems that are comprised of neurons with predefined weights
picture was collected from YouTube videos and has former
and predispositions. However, neural systems do not scale well
and un-illustrated frames involved 3D torso and head
to larger pictures. The authors rapidly create countlessly and
movements.
wind up overfitting on the preparation set as every neuron in a
layer is completely associated with every one of the neurons in Therefore the proposed benchmark “MPII Human Posture”
the past layer. CNN took advantage of the fact that comprises fundamentally progresses in terms of appearance variability
mainly of pictures, so they constrain the engineering in a more and complexity, and incorporates in excess of 40,000 pictures
sensible manner which incomprehensibly lessens the number of individuals or full-body human posture estimation.
of parameters. CNN is engaging for human posture estimation
947
optimization is been done with L2 loss without seeing the Correspondingly, prepared well trained convolutional systems
effect on outliers on the training process which would be for human posture estimation. As an alternative of growing the
affected by outliers. quantity of stages for enhancement, authors investigate how to
enhance the execution of a monocular regression system
The author has used a regression network with ConvNets that showing add-on task [19].
attains robustness against these outliers by reducing Tukey’s
bi-weight function an m-estimator robust to outliers. The IV. FORMULATED WORK
author illustrates quicker combination with improved
The proposed work objective is posed estimation. For the
speculation of the robust loss function for the estimating the
former task, the authors aim to identify x-y pixel coordinates
posesfrom images of faces.
for 14 body joints depicted in the Figure 1. For the latter, they
A significant Convolution neural system is utilized in aim to label the images based on the activity category and to
heterogeneous performing various tasks learning structure for estimate the poses using Convolutional Neural Network
human posture estimation from single images. This framework (CNN). This paper proposes the regression approach as a
takes in a posture joint regression and a sliding-window body- solution to human posture estimation problem that can be
part indicator in a deep organize design demonstrating that designed as a hereditary convolutional neural network. The
including the body-part location regularizes the system CNN outputs the pixel coordinate of each body key joints by
producing better solutions. The researcher reports a taking as input a full raw image of (96*96pixels).
competitive and state-of-arts results in manydatasets.[13]
Since 1980s this field of research has been considered
The authors in their paper, utilized a heterogeneous multi-task important by computer science communities because of it
regression task method [14] and the arrangement assignment having a strong point in giving modified support to other
that urged to have a similar sparsity design. They discovered applications. It has range of implementations from medicine,
that joint-preparing patterns to locate the most valuable human-computer interaction to sociology.
element in the contribution for the two assignments to have a
V. RESEARCH METHODOLOGY
similar component layer, which brings about learning shared
portrayal that is useful for both thetasks. In pose estimation, the authors identify the position and
orientation of an object mean and detect keypoints locations
The author also illustrated that deep neural system[15] forecast that describe the object. This paper will discuss the human
and identifies the location of the various body parts like head pose estimation where major parts/joints are detected and
and hands just by knowing the close by neighbors with learned localized in the body analyzing the state of the art in
lodge features by taking a pose-sensitive embedding with non- articulated human pose estimation using a new large-scale
linear NCA (neighborhood components analysis) regression. benchmark dataset.
Instead, likewise, from regression network systems the work
can directly yield the joint locations from the accessory The dataset includes 40,000 images collected through an
undertakings learning shared"posehighlights". recognized catalogue of human activities. The collected images
cover 410 human activities in total. Pictures are analyzed on
A multi-stage framework with the deep convolutional the basis of annotations including body pose, body part
network[16] is worked in this paper for foreseeing facial point occlusion, torso, and head viewpoint, and personal activity The
areas. They use an arrangement of neural frameworks that architecture takes an input of w*h size and producing an
emphasis on various areas of the input picture. output which includes of 2D locations of anatomically key
points for every person in the image.
The researchers estimates different CNN structures[17] with
regard to hand shape, joint visibility and articulation The network predicts a set of locations and a set of confidence
distributions which include isolated 3D hand pose estimation maps S which has body part locations and degree association
targeting low mean errors, 3D volumetric representations between parts in the form of 2D vector fields. Where there are
outperform 2D CNN capturing the spatial structure of the J confidence maps in set S and C vector fields in L.
depth data.
The procedure involves feeding of images into feedforward
The authors describes Human Mesh Recovery(HMR)[18] network which predicting a set of locations with a set 2D
which is an end to end framework for rebuilding a full 3D vector fields having degree association between parts. Where S
mesh of human body form an RGB image. This paper consists of J confidence maps and set L has C vector fields.
computes a much richer depiction that is parameterized by Beige color represents the confidence map and lowers branch
shape and 3D joint angles running in a real-time with person in represented as blue affinity fields.
bounding box.
948
MPII dataset outputs 15 points which are :
Joint Head Nec Rt. Rt. Rt. Lt. Lt. Lt. Rt. Rt. Rt. Lt. Lt. Lt. Chest Backgroun
k Shoul elbo wrist Shou Elbo Wris Hip Knee Ankl Hip Knee Ank d
der w lder w t e le
Key 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Fig. 2. Architecture of the two-branch multistage CNN. After every stage the
predictions of previous branches and image feature are concatenated .[21]
949
In Stage 0 feature maps for the input image is created using the as the aim is to identify x-y pixel coordinates for 14 body
first 10 layers of the VGGNet as shown in Figure 2. joints depicted in the figures above. For the latter, aim is to
label the images based on the activity category and to estimate
The starting 10 layers of the VGGNet would create a feature the poses using Convolutional Neural Network
map. Then stage 1 consists of dual branches, the first branch (CNN).Formulating the human posture and estimating the
predicts a set of 2D confidence maps(S) of the location of body problem as a regression problem that can be designed as a
parts. Whereas, confidence maps of affinities encode the hereditary convolutional neural network. The CNN outputs the
degree of association between parts. Then stage 2 both the pixel coordinate of each body key joints by taking as input a
maps i.e., greedy inference parse confidence and affinity maps full raw image of (96*96 pixels).
to produce the 2D key points for all people in the image where
each stage where every subsequent stage concatenation of Where number of training examples are N, D is the train-set,k
predictions from both branches in the previous stage and the is the number of body key points, and wi is the weights learned
original image is performed. for the ith body keypoint.
950
Table 2 below shows the earlier state of the art significantly their ease and instinctive nature, and their lessened number of
exceeding previous state results on the MPII multi-person a parameter when diverged from the totally related model.
benchmark. It has some qualitative results.
CNN has the advantage of being a model that accepts the
TABLE 2: (a): Evaluation result on the testing subset. whole picture as an info motion for each body points rather
(b): Evaluation result on the whole testing subset. than nearby locators where they are obliged to a single part.
The subset of 288 images
The authors demonstrated that human posture estimation can
Method Hea Sho Elb Wri Hip Kn Ankl s/im be thrown a regression issue and demonstrated with a non-
e e age exclusive CNN. Their use of CNN to the issue of human
posture estimation accomplishes aggressive outcomes on a
Deepcut[ 73.4 71.8 57.9 39. 56. 44. 32.0 579 testing scholarly dataset with a basic model. They guess that
22] 9 7 0 95 they would have the capacity to accomplish shockingly better
outcomes with more processor power and space (the depth of
Iqbal et 70.0 65.2 56.4 46. 52. 47. 44.5 10
al.[23] 1 7 9 their relapse CNN was limited by the RAM of the GPU it was
prepared on.
Deeper 87.9 84.0 71.9 63. 68. 63. 58.1 230
Cut[24] 9 8 8 To diminish the hole among preparing and approval execution
additionally calibrate the hyperparameters of their
demonstrate. Interesting points incorporate altering the base
Full Testing Set learning rate and learning rate approach, trying distinctive
kinds of momentum updates, and tuning the regularization
Method Hea Sho Elb Wri Hip Kne Ank s/im
quality. Besides, display gatherings can be utilized to
age increment performance. They might likewise want to try
different things with a blend of joint estimation and action
Deepcut 73.4 72.5 60.2 51.0 57. 52.0 45.4 48.5 characterization undertakings to check whether knowing the
[24] 2 areas of joints in a picture enhances the air conditioner activity
arrangement execution of a CNN. They would first run the info
Iqbal et 70.0 53.9 44.5 35.0 42. 36.7 31.1 10 picture through the joint estimation relapse model to get the
al.[23] 2 human posture data and utilize this as an optional contribution
(notwithstanding the first info picture) into the characterization
demonstrate. This extra data may enable their model to decide
the movement is performed in the picture. Nonetheless, it is
conceivable that 2D pixel coordinates won't give adequately
valuable data regarding the genuine 3D posture estimation.
REFERENCES
[1] Ντίνου, Ι. (2018). Human pose estimation using convolutional
neural networks (Doctoral dissertation).
[2] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2016). Real-time
multi-person 2d pose estimation using part affinity fields. arXiv
preprint arXiv:1611.08050.
[3] Jain, A., Tompson, J., Andriluka, M., Taylor, G. W., &Bregler,
C. (2013). Learning human pose estimation features with
convolutional networks. arXiv preprint arXiv:1312.7302.
[4] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2016). Real-time
multi-person 2d pose estimation using part affinity fields. arXiv
preprint arXiv:1611.08050.
[5] Fan, X., Zheng, K., Lin, Y., & Wang, S. (2015). Combining
local appearance and holistic view: Dual-source deep neural
networks for human pose estimation. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition
Fig. 5. 14 body points that aim to identify (pp. 1347-1355).
[6] Bulat, A., &Tzimiropoulos, G. (2016, October). Human pose
VIII. CONCLUSION AND FUTURE WORK estimation via convolutional part heatmap regression. In
European Conference on Computer Vision (pp. 717-732).
If the authors consider PC vision assignments convolutional Springer, Cham.
neural networks system are most demanding design due to
951
[7] Chen, X., Yuille, A.L.: Articulated pose estimation by a [16] Y. Sun, X. Wang, and X. Tang. Deep convolutional network
graphical model with image dependent pairwise relations. In: cascade for facial point detection. In CVPR, 2013
NIPS. (2014) [17] Yuan, S., Garcia-Hernando, G., Stenger, B., Moon, G., Yong
[8] Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of Chang, J., Mu Lee, K., ... & Yuan, J. (2018). Depth-based 3d
a convolutional network and a graphical model for human pose hand pose estimation: From current achievements to future
estimation. In: NIPS. (2014) goals. In The IEEE Conference on Computer Vision and Pattern
[9] Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via Recognition (CVPR).
deep neural networks. In: CVPR. (2014) [18] Kanazawa, A., Black, M. J., Jacobs, D. W., & Malik, J. (2018,
[10] Pfister, T., Simonyan, K., Charles, J., & Zisserman, A. (2014, June). End-to-end recovery of human shape and pose. In The
November). Deep convolutional neural networks for efficient IEEE Conference on Computer Vision and Pattern Recognition
pose estimation in gesture videos. In Asian Conference on (CVPR).
Computer Vision (pp. 538-552). Springer, Cham. [19] A. Toshev and C. Szegedy. Deeppose: Human pose
[11] Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for estimation via deep neural networks. In CVPR, 2014.
human pose estimation in videos. In: ICCV. (2015) [20] http://human-pose.mpi-inf.mpg.de/
[12] Belagiannis, V., Rupprecht, C., Carneiro, G., &Navab, N. [21] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2016).
(2015). Robust optimization for deep regression. In Proceedings Realtime multi-person 2d pose estimation using part affinity
of the IEEE International Conference on Computer Vision (pp. fields. arXiv preprint arXiv:1611.08050.
2830-2838). [22] L. Pishchulin, E. Insafutdinov, S. Tang, B. Andres, M.
[13] Li, S., Liu, Z. Q., & Chan, A. B. (2014). Heterogeneous multi- Andriluka, P. Gehler, and B. Schiele. Deepcut: Joint subset
task learning for human pose estimation with the deep partition and labeling for multi person pose estimation. In
convolutional neural network. In Proceedings of the IEEE CVPR, 2016.
conference on computer vision and pattern recognition [23] U. Iqbal and J. Gall. Multi-person pose estimation with local
workshops (pp. 482-489). joint-to-person associations. In ECCV Workshops, Crowd
[14] C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning Understanding, 2016.
hierarchical features for scene labeling.IEEE TPAMI, 2013 [24] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and
[15] G. W. Taylor, R. Fergus, G. Williams, I. Spiro, and C. Bregler. B. Schiele. Deepercut: A deeper, stronger, and faster
Pose-sensitive embedding by nonlinear nca regression. In NIPS, multiperson pose estimation model. In ECCV, 2016.
pages 2280–2288, 2010
952