Liu 2020

Journal of Ambient Intelligence and Humanized Computing
https://doi.org/10.1007/s12652-020-02335-x
ORIGINAL RESEARCH
Construction and evaluation of the human behavior recognition

model in kinematics under deep learning
Xiao Liu1 · De‑yu Qi1 · Hai‑bin Xiao1
Received: 22 March 2020 / Accepted: 10 July 2020

© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
To explore the construction and evaluation of the human behavior recognition model in kinematics by deep learning, the
convolution neural network (CNN) in the field of deep learning was applied to build the CNN human behavior recognition
algorithm model. The image data were collected from the KTH and Weizmann datasets and trained; then, the proposed
algorithm was simulated by the TensorFlow platform. The results suggested that in the analysis of recognition effect of kin-
ematic description in two datasets, the accuracy of the histogram of optical flow orientation (HOF) method was the worst in
both KTH and Weizmann datasets, while the accuracy of the constructed Visual Geometry Group-16 (VGG-16) algorithm
model was the highest. In the analysis of the accuracy of the KTH dataset, the boxing action had the highest accuracy of
recognition, and the running action had the lowest accuracy of recognition. The average recognition value of all kinds of
actions was 91.93%. In the precision analysis of the Weizmann dataset, the bending and hand waving actions had the most
accurate recognition rate, while the running action had the lowest recognition rate.
Keywords Deep learning · Human behavior recognition · Convolutional neural network · TensorFlow platform
1 Introduction great significance to recognize human behavior in the pro-

cess of action from the perspective of computer vision.
With the rapid development of science and technology, the Generally, the analysis of human behavior not only
living standards for human beings are constantly improving; involves the appearance of body movements but also the
thus, health has become the focus of attention. Physical exer- psychology, emotion, and intention of people. This is
cise is very common in daily life, and it is also an important because human body movements are controlled by human
way to exercise. However, it also often results in physical consciousness, and the presented body movements are sig-
injury due to improper behaviors or lack of understanding in nals obtained through visual perception (Dingenen et al.
the process of exercise. Thus, it is very important to monitor 2018). At present, in computer vision, deep learning has
and guide physical health and sports (Pan et al. 2017; Kim been widely used in image recognition, classification, evalu-
et al. 2016). In recent years, as technology advances, the ation, prediction, and analysis, but its advantages in continu-
computer vision industry has become a research focus and ous video sequence processing are not obvious. Convolu-
difficulty; especially, the recognition and analysis of human tional neural network (CNN) is an important topic in the
behavior in the videos has become the focus in various field of deep learning, especially in image feature extraction
fields. The recognition of human behavior not only guides and recognition, and has been concerned by many scholars
correct way of action but also plays a very positive role in (Chiovetto et al. 2018). The advantages of CNN technology
the fields of medical diagnosis, medical care for the elderly, are not only embodied in the high accuracy of image rec-
and sports analysis (Ciardo et al. 2018). Therefore, it is of ognition but also the self-learning ability of image features
from the original image during image feature extraction
(Edey et al. 2017). It is very difficult to recognize human
* De‑yu Qi actions at present, which not only includes the model rec-
qideyu@gmail.com ognition, video image processing, and data mining but also
covers the knowledge of artificial intelligence and cognitive
1
College of Computer Science and Technology, South China field. As more video resources appearing on the Internet,
University of Technology, Guangzhou 510006, China
13
Vol.:(0123456789)
X. Liu et al.
a large amount of video is retrieved (Shamur et al. 2016). by optical motion, which improved the accuracy of human
Therefore, the recognition of human behavior in the videos motion. Patwardhan (2017) proposed a multimodal emo-
has become the strategic focus of the major websites, which tion recognition method based on the combination of three-
further promotes the research on the recognition of human dimensional geometric features, kinematic features (veloc-
behavior in video sequences (Shen et al. 2019a). ity and displacement of joints), and features extracted from
To sum up, the deep learning algorithm is widely used daily behavior patterns (such as head point frequency), and
in all walks of life, especially in the research of image fea- developed three-dimensional geometric and kinematic fea-
ture extraction and recognition; however, its applications tures by using the original feature data in the visual chan-
in the recognition of human actions in the videos are rare. nel, which significantly improved the accuracy of human
Therefore, to recognize the human actions in motion by deep emotion recognition. Chiovetto et al. (2018) determined
learning algorithms, CNN technology is adopted to build the the effective dimensions of dynamic facial expressions by
human behavior recognition model (Shen et al. 2019b; Chen learning the collected facial expressions and used a Bayes-
2018, 2019). Moreover, the image data are collected from ian model to simulate the different numbers of primitive
KTH and Weizmann datasets, and training and simulation models. They finally found that facial expressions might be
are conducted, which provides a new idea for human action controlled by a few independent control units, allowing a
recognition. By using deep learning technology, human very low-dimensional parameterization of facial expressions.
actions are identified, and CNN algorithm models are con- Yang et al. (2019) proposed a multi-sensor fusion system
structed. It is found that the VGG-16 neural network has and a two-level activity recognition classifier, which were
high accuracy, which is superior to other dynamic image trained to recognize the rehabilitation movement of patients.
recognition methods. The results have provided an experi- It was found that the accuracy of this system was greatly
mental basis for human action recognition. improved, and the time, direction, and abnormal gait type
required for falls could be determined in advance.
In summary, there are many pieces of research on the
2 Literature review application of deep learning and human behavior recogni-
tion; however, studies that apply deep learning to human
With the rapid development of science and technology, behavior recognition are seldom. Therefore, to recognize
the era of big data has come, and some progresses on deep human behavior in kinematics, the deep learning algorithm
learning have been made in various fields. Wang et al. is used to build the human behavior recognition model and
(2016) applied deep learning to the field of cognitive infor- extract human behavior features from the video sequence,
matics and applied the engineering in the cognitive system which provides a new method for human behavior recogni-
to achieve deep thinking and deep reasoning. Ohsugi et al. tion in kinematics.
(2017) applied deep learning to the field of materials medi-
cine and found that for the ophthalmic door in remote areas,
using an ultra-wide field of vision to detect rhegmatogenous 3 Methods
retinal detachment (RRD) significantly improved the level
of diagnosis and treatment. Sremac et al. (2018) established 3.1 Human action recognition
a high- precision simulation system for online shopping in
the field of logistics, which could be applied to the man- The significance of life is to exercise. Exercise refers to not
agement of various types of goods in the supply chain. Wu only the exercise of human bodies but also the maintenance
et al. (2019) applied deep learning algorithms to the field of of vitality. In sports, people make all kinds of actions. They
medical imaging, which helped clinicians learn the physical should pay great attention to the standardization of actions;
conditions of the patients and treat the patients efficiently. otherwise, it is easy to cause body damage. Thus, it is very
Human behavior recognition in kinematics is that human important to recognize human actions and guide people’s
behavior is recognized from the video sequence. An effective movements in a standardized way (Corrigan et al. 2017;
feature will not only describe a large number of behavior Kheradpisheh et al. 2016). In the process of human action
categories but also meet the requirements of easy calcula- recognition, the basic flow is shown in Fig. 1. The basic
tion and quick response to the similarity between two similar steps are image preprocessing, moving object detection,
movements. Many scholars have studied human behavior feature extraction, feature analysis, and recognition, respec-
recognition in kinematics. Lv et al. (2016) constructed a tively (Rajalingham et al. 2018).
low-dimensional data-driven prior model of contact infor- In image preprocessing, when a single or a sequence of
mation and intra-articular torque. They also verified the image data is input, the image quality, such as framing and
accuracy of various human motions by estimating and image size modification, needs to be adjusted after the image
inputting the preprocessed human dynamic data captured preprocessing is finished. The moving object detection is to
13
Construction and evaluation of the human behavior recognition model in kinematics under deep…
Image Moving object Motion feature

preprocessing detection extraction
Recognition result feature analysis
Fig. 1 The basic process of traditional human action recognition
segment the human body area from the static image or the Furthermore, the direction and amplitude of each pixel
corresponding action sequence from the image sequence, optical flow are projected, the number of optical flows in
which can be obtained by the image segmentation algo- different directions in the cell is counted. Moreover, the
rithm (Zheng and Liu 2020; Zheng and Ke 2020). After the magnitude of the optical flow amplitude is used as the pro-
detection of the human body is completed, human action jection weight. Through normalization and concatenation of
features are extracted. The static state usually includes the different regional features, the complete image HOF feature
scale-invariant feature transform (SIFT) and Haar-like fea- is finally obtained (Cha et al. 2017). Finally, the extracted
ture method. SIFT method has good robustness in scale, features are analyzed and recognized to obtain the complete
translation, rotation changes, and noise, which has been human action classification results.
successfully applied in target retrieval, image classification,
recognition, and other fields. The common dynamic methods 3.2 Deep learning algorithm
include spatiotemporal interest point algorithm (STP), the
histogram of optical flow orientation (HOF), and motion At present, the most commonly used method in the field of
boundary feature (MBH). In HOF, the action feature of deep learning is CNN. CNN is not only an artificial neural
the video is captured by optical flow, and the feature of the network but also the first successful algorithm for effective
human body changes with time. Therefore, it is necessary training and learning of multi-layer network structure. The
to further encode the optical flow to obtain useful dynamic proposed algorithm can minimize the preprocessing and
information, which is robust to scale change. In the calcula- directly extract the most expressive features from the origi-
tion of the HOF feature, given the pixel (x, y), the optical nal data input without manually specifying the features (Ke
flow is defined as u (x, y), v (x, y), representing the compo- et al. 2017; Su et al. 2019). CNN is very similar to a biologi-
nents of the optical flow in the x-axis direction (horizontal) cal neural network, which can effectively reduce the network
and the y-axis direction (vertical); then, the direction angle θ complexity and the number of parameters through weight
(x, y) and the corresponding amplitude m (x, y) of the optical sharing; in addition, CNN can learn classification images
flow are as follows: and directly input the unprocessed images. The basic flow
( ) structure is shown in Fig. 2. CNN usually divides the train-
v(x, y) ing of network parameters into the forward phase and the
𝜃(x, y) = arctan (1)
u(x, y) backward phase (Lytras et al. 2017). In the forward phase,
a data sample is taken from the training dataset as the net-
work input, and the corresponding output is calculated layer
√
m(x, y) = u(x, y)2 + v(x, y)2 (2)
by layer. In this process, all the network parameters are not
changed. In the backward phase, the network parameters are
As a result, the image is divided into several units, and
adjusted layer by layer according to the comparison between
the maximum value mmax of the optical flow amplitude of
the output results and the original results in advance.
all points is found in each unit, thereby realizing the nor-
Besides, the process of training requires a large number of
malization of each pixel and ensuring the scale invariance
input data, and the network training will be more perfect in
as follows:
the continuous iteration.
m(x, y) For the recognition of human behavior in the video
m� (x, y) = (3)
mmax sequence, the image is input from the perspective of time and
space. Whereas, in time flow CNN, the optical flow superpo-
sition is usually used for input (Hassan et al. 2018). The core
13
X. Liu et al.
Fig. 2 Basic flow chart of CNN
C C C C C C C C
o o o o o o o o
n n n n n n n n
v v v v v v v v
1 2 3 4 5 1 2 3
convolution feature extraction
output
of optical flow field is to represent the displacement field dt 3.3 Human behavior recognition algorithm model
between the continuous video frame t and t + 1. The displace- in kinematics based on CNN
ment field includes the horizontal component dtx and the verti-
cal component dt . In addition, dt (u, v) represent the flow of
y
CNN in the deep learning algorithm is used to recognize
the t-th frame at (u, v) and the corresponding point of the t + 1 human actions. As science and technology boosts, CNN is
frame. Given a video sequence with width and height of w and developed into multiple models, including GoogleNet, VGG,
h, respectively, the optical flow field of continuous L frame is and ResNet. However, the layers of GoogleNet and ResNet
superposed. Then, for any frame τ, the input capacity Iτ with are 22 and 34, respectively. Although the level and accuracy
a size of w × h × 2 L can be obtained as follows: of feature extraction are obvious with the increase in lay-
x ers, the workload is huge, the recognition speed is relatively
I𝜏 (u, v, 2k − 1) = d𝜏+k−1 (u, v) (4) slow, and the hardware requirements are very strict. Finally,
the VGG-16 network model, with a total of 16 layers, is
chosen given the comprehensive consideration of recog-
x
I𝜏 (u, v, 2k) = d𝜏+k−1 (u, v)
u = [1;w] nition accuracy, hardware requirements, and speed. This
(5)
v = [1;h] model consists of five convolution layer groups and three
k = [1;L] fully-connected layers. Each convolution layer group con-
tains multiple convolution layers and one pooling layer. The
The trajectory stacking is sampling along the motion tra-
algorithm model is shown in Fig. 3.
jectory. In this case, the input capacity Iτ of any frame τ is as
Here, the VGG network model extracts the features
follows:
of the image through automatic learning without manual
x
I𝜏 (u, v, 2k − 1) = d𝜏+k−1 (pk ) (6) extraction. The process of feature extraction is to input
the video sequence into a convolution layer group A to
x extract the lower level features of the image. The features
I𝜏 (u, v, 2k) = d𝜏+k−1 (pk )
extracted from each convolution layer group are differ-
u = [1;w]
(7) ent. For example, convolution layer group A extracts 64
v = [1;h]
action features from the input image data and transmits
k = [1;L]
these features to convolution layer group B. Next, the 64
Pk refers to the k-th point in the track, for (u, v) in the start- lower level features transmitted from convolution layer
ing point frame τ, it can be obtained by the following recursive group B are abstracted into 128 higher-level features con-
relationship: tinuously and transmitted to convolution layer group C.
Similarly, the features are continued to convolution layer
p1 (u, v); … pk = pk+1 + d𝜏+k−2 (pk+1 ), k > 1 (8) group E. After the convolution layer group extracts the
features in turn, it gradually abstracts the feature infor-
mation to obtain the advanced features; then, it processes
and analyzes the image features in the fully-connected
layer. Subsequently, it sets the second layer of the fully-
connected layer as the output layer of the action features.
13
Input ... Softmax
ConvE
classifier
ConvC
ConvA
ConvB
Classification
Video sequence Convolution layer group Fully connected layer
result output
Fig. 3 Human action recognition algorithm model based on CNN
Fig. 4 Six behaviors of KTF

dataset (a walking; b jogging;
c running; d punching; e wav-
ing; f clapping)
It is the highest abstract expression of the action features. Table 1 The specific configuration of TensorFlow platform experi-
Finally, the classifier is used to classify the features, and mental environment in the simulation experiment
the human action classification results are acquired. The
Software Operating system Linux 64bit
CNN requires 32 convolution kernels, the size of each
Python version Python 3.6.1
kernel is 5 × 5 × 3; also, each layer of each convolution
TensorFlow version 1.0.1
kernel is 5 × 5.
Hardware CPU Intel Core i7-7700@4. 0 GHz
For data processing, the image data in KTH dataset 8 kernel
is used for analysis. The KTH dataset mainly contains Memory Kingston ddr4 2400 MHz 16G
six kinds of human action behaviors, including walk- GPU Nvidia GeForce 1060 8G
ing, running, jogging, waving, clapping, and punching,
as shown in Fig. 4. The Weizmann dataset contains 10
human action behaviors, including bending, jumping 3.4 Simulation
jacks (jack), jumping, running, siding, skipping, walking,
one-hand waving (wave 1), and two-hands waving (wave The algorithm model of human behavior recognition in kin-
2). After obtaining a large number of data from the above ematics based on the constructed CNN model is simulated
two databases, the constructed human action recognition by the TensorFlow platform (Prati et al. 2019). The platform
model performs feature extraction and classification of absorbs the advantages of other platforms and has developed
human actions.
13
X. Liu et al.
into a mature and perfect deep learning framework with 4.2 Accuracy analysis results in KTH dataset
multiple installation versions of Windows, Linux, and Mac
OS X. The specific experimental environment configura- Using the algorithm model constructed to analyze the data
tion is shown in Table 1. After the TensorFlow platform is collected in the KTH dataset, the KTH behavior recogni-
installed, a Python terminal is utilized for testing. Then, the tion confusion matrix is obtained and shown in Fig. 6. It
image data of KTH and Weizmann datasets are collected and is found that the proposed method has good accuracy, of
trained. Finally, the performance of the model is analyzed. which boxing has the highest recognition accuracy, reach-
ing 97%, and running has the lowest recognition accuracy,
reaching 84%. In addition, it can be seen that running and
4 Results and discussion jogging are easy to be confused, and the degree of confusion
is higher than that between running and walking. This may
4.1 Analysis of the recognition effect of kinematic be that jogging is a kind of human action behavior between
description in two datasets running and walking. Its action is more similar to running;
so, the motions running and jogging are confusing. In addi-
The constructed human action recognition algorithm model tion, there is also a part of data that shows the problem of
is compared with the commonly used dynamic methods irregular action of people, which leads to confusion of clas-
such as STP, HOF, and MBH, and the results are shown sification. There is also a slight confusion between waving
in Fig. 5. In KTH and Weizmann datasets, it is found that
the accuracy of the HOF method is the worst, the STP and
MBH method mark the middle position, while the VGG-16
algorithm model has the highest accuracy. The accuracy of boxing6 0.02 0.97
kinematic human behavior recognition of each method in
hand clapping5 0.03 0.94 0.02
the KTH dataset is lower than that in the Weizmann dataset.
Therefore, from the above results, it can be inferred that hand waving4 0.92 0.05 0.02
the recognition accuracy of the algorithm models is higher
than that of the classical HOF and MBH methods. The rea- jogging3 0.04 0.07 0.90
son may be that the constructed VGG-16 algorithm model
running2 0.03 0.84
captures the kinematic characteristics of the light field well, 0.13
eliminates the boundary noise, and removes the camera and
walking
1
0.91 0.04 0.03
the complex background. Additionally, it abstracts the fea-
ture information many times to get the advanced features, walking r
wal running jogg
king unningjogging
h
hand h
hand box
ing and wa and daboxingin
which makes the description ability of action information ving
waving clapping ppin g
g
and detailed space information further enhanced.
Fig. 6 Confusion matrix of various behavior recognition in KTH
dataset
Fig. 5 Analysis of recognition effect of kinematic description in two datasets (a in KTH dataset; b in Weizmann dataset)
13
and clapping. The reason is that although the standard move-

wave2
10 0.02 0.97
ments of the two movements are quite different, there will be
confusion because the two movements of waving and clap- wave1
9 1.00
ping are very similar where the two hands are at the same walk8 0.07 0.93
level. Further experiments are carried out, and the results are skip7 0.02 0.97
shown in Fig. 7. It is found that the average accuracy rate in
KTH dataset is 91.93%. Only the accuracy rate of running side6 0.04 0.94 0.03
is below 90%, which may be that running is very similar to run5 0.91 0.08
jogging and walking, and the collected action data of run- pjump
4 0.03 0.95
ning are less.
jump3 0.94 0.02 0.03
4.3 Accuracy analysis results in the Weizmann jack2 0.94 0.04

dataset bend1 1.00
bend Gack
j jump pjump
junp run side skip walk wave1wave2
The constructed algorithm model is used to analyze the data
collected in the Weizmann dataset. The confusion matrix Fig. 8 Confusion matrix of behavior recognition in Weizmann dataset
of the Weizmann behavior recognition is shown in Fig. 8.
It is found that the proposed method has good accuracy,
among which bending and one-hand waving have the highest take-off. Further experiments are carried out on the data col-
accuracy, approaching 100%, while running has the lowest lected in the Weizmann dataset many times, and the results
accuracy, approaching 91%. In addition, running and jog- are shown in Fig. 9. It is found that the average accuracy in
ging are easy to be confused, and the degree of confusion the Weizmann dataset is as high as 95.75%, which is sig-
is higher than that between running and walking. This may nificantly higher than the average in the KTH dataset. This
be that jogging is a kind of human action behavior between may be that the Weizmann dataset uses a single background
running and walking, and its action is more similar to run- cut to record in the indoor fixed lens with less interference.
ning; meanwhile, its static image data is used. Therefore,
the confusion degree between running and jogging is high.
There are also some problems in the data that the action of 5 Conclusion
the demonstrator is not standardized, which leads to the con-
fusion of classification. In addition, there is a slight confu- With the rapid development of science and technology,
sion between jacks and wave 2. This is because although the to reduce the damage of people in the process of actions,
two actions perform hand movements simultaneously, there human action recognition has gradually become one of the
is great similarity at a certain angle. There are also some hot research directions in the field of computer vision. The
similarities between the jumping and the skipping, which application of human motion recognition has a wide range,
are caused by the very similar leg movements during the involving human-computer interaction, video monitoring,
and other aspects, with high research value (Zheng et al.
2015). Here, CNN in the field of deep learning is applied
100
100
80
Accuracy rate(%)
80
Accuracy rate(%)
60
60
40
40
20
20
0
r
wal running jogg han han box
king unningjogging
walking ing hand
d w hand in
d boxing
ving dappin g
waving aclapping 0
g bend Gack
j jump pjunp
mp run side skip walk wave1 wave2
Fig. 7 Analysis of the accuracy rate of various actions recognition in Fig. 9 Analysis of recognition accuracy of various actions in Weiz-
KTH dataset mann dataset
13
X. Liu et al.
to human behavior recognition to build the CNN algorithm Chen M (2018) The research of human individual’s conformity behav-
model. The image data are collected from KTH and Weiz- ior in emergency situations. Libr Hi Tech. https: //doi.org/10.1108/
LHT-08-2018-0113
mann datasets for training; then, the algorithm is simulated Chen M (2019) The impact of expatriates’ cross-cultural adjustment on
using the TensorFlow platform. In the analysis of the rec- work stress and job involvement in the high-tech industry. Front
ognition effect of kinematic description in two datasets, it Psychol 10:2228. https://doi.org/10.3389/fpsyg.2019.02228
is found that the accuracy of the HOF method is the worst Chiovetto E, Curio C, Endres D et al (2018) Perceptual integration
of kinematic components in the recognition of emotional facial
in both KTH and Weizmann datasets, while STP and MBH expressions. J Vis 18(4):13–13
method are in the middle position, and VGG-16 algorithm Ciardo F, Campanini I, Merlo A et al (2018) The role of perspective
model has the highest accuracy. In the accuracy analysis of in discriminating between social and non-social intentions from
the KTH dataset, it is found that the boxing action has the reach-to-grasp kinematics. Psychol Res 82(5):915–928
Corrigan BW, Gulli RA, Doucet G et al (2017) Characterizing eye
highest accuracy of recognition, and the running action has movement behaviors and kinematics of non-human primates dur-
the lowest accuracy of recognition. The average recogni- ing virtual navigation tasks. J Vis 17(12):15–15
tion value of all kinds of actions is 91.93%. In the accuracy Dingenen B, Staes FF, Santermans L et al (2018) Are two-dimensional
analysis of the Weizmann dataset, it is found that the bend- measured frontal plane angles related to three-dimensional meas-
ured kinematic profiles during running? Phys Ther Sport 29:84–92
ing and one hand waving actions have the highest accurate Edey R, Yon D, Cook J et al (2017) Our own action kinematics predict
rate of recognition, while the running action has the lowest. the perceived affective states of others. J Exp Psychol Hum Per-
The average value of recognition of all kinds of actions is cept Perform 43(7):1263
95.75%, which is significantly higher than that of the KTH Hassan MM, Uddin MZ, Mohamed A et al (2018) A robust human
activity recognition system using smartphone sensors and deep
dataset. learning. Fut Gen Comput Syst 81:307–313
In conclusion, human actions are recognized by deep Ke Q, An S, Bennamoun M et al (2017) Skeletonnet: Mining deep
learning algorithms. A CNN algorithm model is built, and part features for 3-d action recognition. IEEE Signal Process Lett
the VGG-16 neural network has high accuracy and is supe- 24(6):731–735
Kheradpisheh SR, Ghodrati M, Ganjtabesh M et al (2016) Deep net-
rior to other dynamic image recognition methods, which works can resemble human feed-forward vision in invariant object
provides an experimental basis for human action recogni- recognition. Sci Rep 6:32672
tion. However, there are some shortcomings, such as that the Kim H, Lee S, Kim Y et al (2016) Weighted joint-based human
processed data are image data with no spatial characteristics behavior recognition algorithm using only depth information for
low-cost intelligent video-surveillance system. Expert Syst Appl
and time continuity. Therefore, in the follow-up study, time 45:131–141
series characteristics will be added between groups to fur- Lv X, Chai J, Xia S (2016) Data-driven inverse dynamics for human
ther improve the recognition accuracy. motion. ACM Trans Graph 35(6):163
Informed consent was obtained from all individual par- Lytras MD, Raghavan V, Damiani E (2017) Big data and data analytics
research: from metaphors to value space for collective wisdom in
ticipants included in the study. human decision making and smart machines. Int J Semant Web
Inf Syst 13(1):1–10
Acknowledgements This work was supported by the National Natural Ohsugi H, Tabuchi H, Enno H et al (2017) Accuracy of deep learning, a
Science Foundation of China (No. 61070015) and the Guangdong Pro- machine-learning technology, using ultra–wide-field fundus oph-
vincial Frontier and Key Technology Innovation Special Funds Major thalmoscopy for detecting rhegmatogenous retinal detachment.
Science and Technology Project (No. 2014B010110004). Sci Rep 7(1):9425
PanK X J, Skjervøy MV, Chan WP et al (2017) Automated detec-
Compliance with ethical standards tion of handovers using kinematic features. Int J Robot Res
36(5–7):721–738
Patwardhan A (2017) Three-dimensional, kinematic, human behavioral
Conflict of interest All authors declare that they have no conflict of pattern-based features for multimodal emotion recognition. Mul-
interest. timodal Technol Interact 1(3):19
Prati A, Shan C, Wang KIK (2019) Sensors, vision and networks: from
Ethical approval This article does not contain any studies with human video surveillance to activity recognition and health monitoring.
participants or animals performed by any of the authors. J Ambient Intell Smart Environ 11(1):5–22
Rajalingham R, Issa EB, Bashivan P et al (2018) Large-scale, high-
Informed consent Informed consent was obtained from all individual resolution comparison of the core visual object recognition behav-
participants included in the study. ior of humans, monkeys, and state-of-the-art deep artificial neural
networks. J Neurosci 38(33):7255–7269
Shamur E, Zilka M, Hassner T et al (2016) Automated detection of
feeding strikes by larval fish using continuous high-speed digital
video: a novel method to extract quantitative data from fast, sparse
References kinematic events. J Exp Biol 219(11):1608–1617
Shen C-w, Ho J-t, Ly PTM, Kuo T-c (2019) Behavioural intentions
Cha YJ, Choi W, Büyüköztürk O (2017) Deep learning-based crack of using virtual reality in learning: perspectives of acceptance
damage detection using convolutional neural networks. Comput‐ of information technology and learning style. Virtual Real
Aided Civil Infrastruct Eng 32(5):361–378 23(3):313–324. https://doi.org/10.1007/s10055-018-0348-1
13
Shen C-w, Min C, Wang C-c (2019b) Analyzing the trend of O2O Zheng Y, Ke H (2020) The adoption of scale space hierarchical cluster
commerce by bilingual text mining on social media. Comput Hum analysis algorithm in the classification of rock-climbing teach-
Behav 101:474–483. https://doi.org/10.1016/j.chb.2018.09.031 ing evaluation system. J Ambient Intell Hum Comput. https://doi.
Sremac S, Tanackov I, Kopić M et al (2018) ANFIS model for deter- org/10.1007/s12652-020-01778-6
mining the economic order quantity. Decis Mak Appl Manag Eng Zheng Y, Liu S (2020) Bibliometric analysis for talent identification by
1(2):81–92 the subject–author–citation three-dimensional evaluation model
Su Y, Han L, Wang J, Wang H (2019) Quantum-behaved RS-PSO- in the discipline of physical education. Libr Hi Tech. https://doi.
LSSVM method for quality prediction in parts production org/10.1108/LHT-12-2019-0248
processes. Concurr Comput-Pract Exp 9:e5522. https : //doi. Zheng Y, Zhou Y, Lai Q (2015) Effects of twenty-four move shadow
org/10.1002/cpe.5522 boxing combined with psychosomatic relaxation on depression
Wang Y, Widrow B, Zadeh LA et al (2016) Cognitive intelligence: and anxiety in patients with type-2 diabetes. Psychiatr Danub
Deep learning, thinking, and reasoning by brain-inspired systems. 27(2):174
Int J Cogn Inform Nat Intell 10(4):1–20
Wu D, Pigou L, Kindermans PJ et al (2016) Deep dynamic neural Publisher’s Note Springer Nature remains neutral with regard to
networks for multimodal gesture segmentation and recognition. jurisdictional claims in published maps and institutional affiliations.
IEEE Trans Pattern Anal Mach Intell 38(8):1583–1597
Wu Y, Luo Y, Chaudhari G et al (2019) Bright-field holography: cross-
modality deep learning enables snapshot 3D imaging with bright-
field contrast using a single hologram. Light Sci Appl 8(1):25
Yang T, Gao X, Gao R et al (2019) A Novel Activity Recognition
System for Alternative Control Strategies of a Lower Limb Reha-
bilitation Robot. Applied Sciences 9(19):3986
13

Liu 2020

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Liu 2020

Uploaded by

Copyright:

Available Formats

Journal of Ambient Intelligence and Humanized Computing

Construction and evaluation of the human behavior recognition

Received: 22 March 2020 / Accepted: 10 July 2020

1 Introduction great significance to recognize human behavior in the pro-

Image Moving object Motion feature

Recognition result feature analysis

Fig. 1 The basic process of traditional human action recognition

Fig. 2 Basic flow chart of CNN

convolution feature extraction

Input ... Softmax

Fig. 3 Human action recognition algorithm model based on CNN

Fig. 4 Six behaviors of KTF

and clapping. The reason is that although the standard move-

4.3 Accuracy analysis results in the Weizmann jack2 0.94 0.04

You might also like

Liu 2020

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Liu 2020

Uploaded by

Copyright:

Available Formats

Journal of Ambient Intelligence and Humanized Computing

Construction and evaluation of the human behavior recognition

Received: 22 March 2020 / Accepted: 10 July 2020

1 Introduction great significance to recognize human behavior in the pro-

Image Moving object Motion feature

Recognition result feature analysis

Fig. 1 The basic process of traditional human action recognition

Fig. 2 Basic flow chart of CNN

convolution feature extraction

Input ... Softmax

Fig. 3 Human action recognition algorithm model based on CNN

Fig. 4 Six behaviors of KTF

and clapping. The reason is that although the standard move-

4.3 Accuracy analysis results in the Weizmann jack2 0.94 0.04

You might also like

4.3 Accuracy analysis results in the Weizmann jack2 0.94 0.04