You are on page 1of 4

Human Activity Recognition Based on Wearable

Devices and Feedforward Neural Networks


James Parluhutan Hutabarat∗1 , Nur Ahmadi∗2 , Trio Adiono∗3
∗ School of Electrical Engineering and Informatics, Bandung Institute of Technology, Bandung, 40132, Indonesia
† Center for Artificial Intelligence (U-CoE AI-VLB), Bandung Institute of Technology, Bandung, 40132, Indonesia
Email: 1 13219070@std.stei.itb.ac.id, 2 nahmadi@itb.ac.id, 3 tadiono@stei.itb.ac.id

Abstract—Human activity recognition (HAR) using wearable Neural Networks (RNN) [11], Long Short-Term Memory-
sensors has garnered significant attention in recent years. This based RNN (RNN-LSTM) [12], and Convolutional Neural
technology can continuously track human activities which is Networks (CNN) [13], have demonstrated high accuracy in
essential for remote health monitoring, elderly care, and re-
habilitation. In this study, we propose a simple feedforward HAR, they often entail complex algorithms compared to
neural network (FFNN) model equipped with four fully connected conventional neural networks. Our primary objective is to
layers to enhance the accuracy of HAR. Experimental results achieve high classification accuracy using publicly available
conducted on the UCI-HAR (UCI Human Activity Recognition) dataset to infer human activity information. In this paper,
dataset show that the proposed FFNN model achieve an accuracy we leverage the new version of the UCI-HAR (University of
and F1 score of 0.96. It outperforms other existing algorithms,
which are K-Nearest Neighbors (KNN) and Support Vector California Irvine Human Activity Recognition) dataset [14],
Machines (SVM), highlighting the effectiveness of a simplified which comprises accelerometer and gyroscope data collected
neural network architecture in enhancing HAR performance. via smartphones, each labeled with a corresponding activity.
Our approach focuses on modeling human activity patterns
I. I NTRODUCTION effectively.
Human activity recognition (HAR) has evolved rapidly with The rest of this paper is organized as follows. Section II
the development of wearable technology. These devices have introduces the methodology including dataset specification,
seamlessly integrated into everyday life, catering to a multitude data preprocessing, the proposed Feedforward Neural Network
of applications [1], [2]. These devices continuously track an (FFNN) model and its underlying theory. Section III presents
individual’s movements and position, enabling location-based the experimental results, including accuracy, precision, recall,
services and enhancing various aspects of daily life [3]. HAR F1 score, and a detailed examination of the confusion matrix.
technologies have gone through extensive development, with Finally, in Section IV, we conclude our findings.
a focus on improving accuracy, particularly when deployed
on wearable devices such as smartwatches and smartphones. II. M ETHODS
These devices are strategically placed on different parts of the
human body, including the wrist, ankle, chest, knee, and neck A. Dataset Specification
[4].
This paper utilized the UCI-HAR dataset to recognize
Efforts have been made to create compact models suitable
human activities. The dataset consists of data collected from
for smaller devices, such as microcontrollers [5]. Microcon-
users wearing a smartphone on their waist, equipped with two
trollers, often with less than 1Mb of memory, require compact
motion-embedded sensors. The data collection was conducted
models. Simple and accurate models are a practical solution,
with a Samsung Galaxy S II device at a sampling frequency
alongside techniques like model quantization and pruning
of 50 Hz [14]. Table I provides an overview of the dataset’s
[6]. Key sensors, including accelerometers, gyroscopes, and
specifications.
magnetometers, play a pivotal role in understanding human
activities [7]. This study specifically utilizes accelerometer
and gyroscope data to track human movements and recognize TABLE I
DATASET S PECIFICATION
six fundamental activities: walking, walking upstairs, walking
downstairs, sitting, standing, and lying down. Accurate recog- Specification UCI-HAR
nition of these activities holds significant potential for daily
Device Samsung Galaxy S II
monitoring and assistance [8]. Sensor Accelerometer & Gyroscope
While various techniques, including K-Nearest Neighbors Sampling frequency 50 Hz
Number of samples 10,929
(KNN) [9], Support Vector Machine (SVM) [10], Recurrent Number of subjects 30
Number of activities 6

979-8-3503-8129-0/23/$31.00 ©2023 IEEE


Fig. 1. Diagram of HAR model development.

B. Data Preprocessing where x is the input data from the input layer. We also use
Data preprocessing involves the transformation of raw data dropout function to prevent overfitting by setting the fraction
into a suitable format for machine learning processes. The key from the output of ReLu activation to zero.
steps in data preprocessing include normalization, label encod- out = x · mask (3)
ing, and the division of the dataset into training, validation, and
testing subsets. where out is the output data, x is the input data from the input
1) Normalization: Accelerometer and gyroscope data from layer, and mask is a binary mask that sets the output of the
the sensors have varying value ranges. Therefore, the first step activation to zero.
in preprocessing is to normalize the data to ensure that each We employ four fully connected layers for human activity
feature has a mean of zero and a standard deviation of one. recognition task. After the last fully connected layer, a softmax
This is done using the StandardScaler from the Scikit- layer is used to obtain probability distribution over the classes.
learn library. This function normalized the output of the final hidden layer
2) Label Encoding: The labels for human activities in the using exponentiating each element and dividing it by sums of
initial dataset are initially represented in categorical (text) exponentiated [16].
form, such as “Walking”, “Sitting”, “Standing”, and others. exp(xi )
To convert categorical data into numerical form and make it softmax(xi ) = P (4)
j exp(xj )
suitable for training machine learning models, label encoding
is performed using the Scikit-learn’s LabelEncoder where i is the index from the output layer, j is the total index
3) Splitting: The initial dataset is split into three sub- from the output layer, xi is the output value of the ith neuron,
sets: training, validation, and testing, using the Scikit-learn’s and xj is the output value of the jth neuron.
train_test_split function. The data splitting propor- The FFNN model is shown in Figure 1. The model uses
tions are as 64% for training, 16% for validation, and 20% four fully connected layers and three dropout layers. In the
for testing. first layer, the model employs a linear layer with an input
dimension that matches the dataset’s feature size and an output
C. FeedForward Neural Networks (FFNN) dimension of 64. Subsequently, the model applies a 50%
Feedforward neural networks consist of several layers with dropout rate to randomly deactivate activation from the first
interconnected nodes. There are three types of layers: input layer’s output. The next hidden layer comprises 32 output
layer, hidden layer, and output layer. The input layer is the dimensions and also incorporates a 50% dropout rate. The
first layer which receives input data from the sensor. It takes third hidden layer introduces an additional linear layer with
the input and adds the bias to the output. 16 output dimensions. This layer also employs a 50% dropout
rate. Finally, the model’s output layer provides predictions
out = x · w + b (1) for the target classes. We utilize the Adam optimizer during
where x is the input data, w is the weight matrix, b is the bias training.
term, and out is the output data. This first layer is activated D. Performance Metrics
using ReLu non-linear activation function [15].
For the evaluation of the model performance, we use several
out = max(0, x) (2) metrics which are confusion matrix, precision, recall, F1 score,
(a) (b)

Fig. 2. Plot of training and validation (a) loss and (b) accuracy.

(a) (b) (c)

Fig. 3. Confusion matrix of (a) KNN, (b) SVM, and (c) FFNN.

and accuracy. Confusion matrix is a fundamental tool that III. R ESULTS AND D ISCUSSION
tabulates true positive, true negative, false positive, and false
The FFNN model was implemented using Python
negative values. It serves as the foundation for calculating
(v3.10.12), Scikit-learn (v1.2.2) machine learning library [17],
other performance metrics. Precision measures the proportion
and PyTorch (v2.0.1) deep learning framework [18] running
of true positive predictions out of all positive predictions
on Google Colab platform. A series of experiments were
made by the model. Recall, also known as sensitivity or
conducted to evaluate the performance of proposed FFNN
true positive rate, quantifies the proportion of true positive
model in comparison to other algorithms which are k-nearest
predictions of all actual positive samples. F1 score is the
neighbors (KNN) and support vector machine (SVM). Fig-
harmonic mean of precision and recall. It provides a balance
ure 2(a) shows the training (blue line) and validation (orange
between precision and recall. Accuracy measures the overall
line) loss curves of the proposed FFNN model, whereas Fig-
correctness of the model’s predictions. The precision, recall,
ure 2(b) shows the training (blue line) and validation (orange
F1 score, and accuracy are formulated as follows:
line) accuracy curves of the proposed FFNN model. The best
TP loss and accuracy were achieved when the epoch reached
Precision = (5) 35. Figure 3(a)-(c) shows the confusion matrix between the
TP + FP
TP predicted activity and the true activity for KNN, SVM, and
Recall = (6) FFNN.
TP + FN
2 · Precision · Recall The performance comparison among FFNN, KNN, and
F1 score = (7) SVM for six activities is shown in Table II. The precision,
Precision + Recall
TP + TN recall, and F1 score hit the lowest value for standing (0.91),
Accuracy = (8) sitting (0.90), and sitting (0.91), respectively. Except for
TP + TN + FP + FN
sitting and standing cases, FFNN model shows very good
where T P represents true positives (correctly predicted pos- performance with F1 score exceeding 0.95. Table III shows
itive samples), F P represents false positives (incorrectly the average precision, recall, F1 score, and accuracy, which
predicted positive samples), F N represents false negatives are 0.96, 0.95, 0.96, and 0.96, respectively. It is shown that
(missed positive samples), and T N represents true negatives FFNN model outperforms KNN and SVM.
(correctly predicted negative samples). In Table III, the performance of the FFNN model which only
TABLE II
P ERFORMANCE COMPARISON BETWEEN KNN, SVM, AND FFNN

KNN SVM FFNN


Activity
Precision Recall F1 Score Precision Recall F1 Score Precision Recall F1 Score
WALKING 0.85 0.98 0.91 0.94 0.99 0.96 0.95 0.99 0.97
WALKING UPSTAIRS 0.89 0.89 0.89 0.96 0.94 0.95 0.99 0.92 0.95
WALKING DOWNSTAIRS 0.95 0.79 0.86 0.99 0.95 0.97 0.95 0.97 0.96
SITTING 0.91 0.79 0.85 0.98 0.89 0.93 0.93 0.90 0.91
STANDING 0.83 0.93 0.88 0.90 0.98 0.93 0.91 0.94 0.93
LAYING 1.00 0.99 1.00 1.00 0.99 1.00 1.00 1.00 1.00

uses a simple neuron network rather than a complex network [2] X. Zhou, W. Liang, K. I.-K. Wang, H. Wang, L. T. Yang, and Q. Jin,
architecture shows better performance than KNN and SVM. “Deep-learning-enhanced human activity recognition for internet of
healthcare things,” IEEE Internet of Things Journal, vol. 7, no. 7, pp.
The FFNN model shows a slightly improved performance than 6429–6438, 2020.
SVM; both have the same average precision and recall scores. [3] F. Demrozi, G. Pravadelli, A. Bihorac, and P. Rashidi, “Human activity
When compared to KNN, FFNN shows signficant performance recognition using inertial, physiological and environmental sensors: A
comprehensive survey,” IEEE access, vol. 8, pp. 210 816–210 836, 2020.
improvement across all the precision metrics. The proposed [4] E. Ramanujam, T. Perumal, and S. Padmavathi, “Human activity
FFNN model with only 4 fully connected layers have been recognition with smartphone and wearable sensors using deep learning
able to improve the performance of activity classification. techniques: A review,” IEEE Sensors Journal, vol. 21, no. 12, pp.
13 029–13 040, 2021.
The confusion matrices of the KNN and SVM show that [5] J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han et al., “Mcunet: Tiny deep
the misclassification of standing and laying often occurs. In learning on iot devices,” Adv. Neural Inf. Process., vol. 33, pp. 11 711–
this study, the FFNN model enhances the HAR performance, 11 722, 2020.
[6] S. S. Saha, S. S. Sandha, and M. Srivastava, “Machine learning for
especially for these two activities. microcontroller-class hardware-a review,” IEEE Sensors Journal, 2022.
[7] A. Ayman, O. Attalah, and H. Shaban, “An efficient human activity
recognition framework based on wearable imu wrist sensors,” in 2019
TABLE III
IEEE Int. Conf. Imaging Syst. Tech. (IST). IEEE, 2019, pp. 1–5.
C OMPARISON OF AVERAGE PERFORMANCE ACROSS MODELS
[8] O. Chin Ann and B. Lau, “Human activity recognition: A review,”
Proceedings - 4th IEEE International Conference on Control System,
Average performance KNN SVM FFNN Computing and Engineering, ICCSCE 2014, pp. 389–393, 03 2015.
Precision 0.90 0.96 0.96 [9] S. G. K. Patro, B. K. Mishra, S. K. Panda, R. Kumar, H. V. Long,
Recall 0.90 0.95 0.95 D. Taniar, and I. Priyadarshini, “A hybrid action-related k-nearest neigh-
F1 Score 0.89 0.95 0.96 bour (har-knn) approach for recommendation systems,” IEEE Access,
Accuracy 0.91 0.95 0.96 vol. 8, pp. 90 978–90 991, 2020.
[10] M. Ehatisham-Ul-Haq, A. Javed, M. A. Azam, H. M. Malik, A. Irtaza,
I. H. Lee, and M. T. Mahmood, “Robust human activity recognition
using multimodal feature-level fusion,” IEEE Access, vol. 7, pp. 60 736–
60 751, 2019.
IV. C ONCLUSION [11] F. A. Dharejo, M. Zawish, Y. Zhou, S. Davy, K. Dev, S. A. Khowaja,
Y. Fu, and N. M. F. Qureshi, “Fuzzyact: A fuzzy-based framework for
In this study, we propose a Feedforward Neural Network temporal activity recognition in iot applications using rnn and 3d-dwt,”
(FFNN) model comprising four fully connected layers. We IEEE Trans. Fuzzy Syst., vol. 30, no. 11, pp. 4578–4592, 2022.
trained our model on accelerometer and gyroscope sensor data [12] N. K. Singh and K. S. Suprabhath, “Har using bi-directional lstm with
rnn,” in 2021 International Conference on Emerging Techniques in
collected from smartphones and benchmarked its performance Computational Intelligence (ICETCI). IEEE, 2021, pp. 153–158.
against two established models for activity classification. Our [13] D. Ravı̀, C. Wong, B. Lo, and G.-Z. Yang, “Deep learning for human
experimental results demonstrate that our proposed FFNN activity recognition: A resource efficient implementation on low-power
devices,” pp. 71–76, 06 2016.
model outperforms other models in key metrics including [14] D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortiz et al., “A pub-
precision, recall, F1 score, and accuracy. It can significantly lic domain dataset for human activity recognition using smartphones.”
enhance the accuracy of classifying activities that were his- in Esann, vol. 3, 2013, p. 3.
[15] Z. Li, T. Jiang, J. Yu, X. Ding, Y. Zhong, and Y. Liu, “A lightweight
torically prone to misclassification, such as walking upstairs, mobile temporal convolution network for multi-location human activity
standing, and lying down. These findings underscore the recognition based on wi-fi,” in 2021 IEEE/CIC Int. Conf. Commun.
effectiveness of a simple neural network architecture in en- China (ICCC Workshops). IEEE, 2021, pp. 143–148.
[16] S. Majumder and N. Kehtarnavaz, “Vision and inertial sensing fusion
hancing the accuracy of activity classification and reducing for human action recognition: A review,” IEEE Sensors Journal, vol. 21,
misclassification rates. Furthermore, the model’s simplicity no. 3, pp. 2454–2467, 2020.
and computational efficiency suggest its potential for real-time [17] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg et al.,
activity monitoring on resource-constrained edge devices. “Scikit-learn: Machine learning in python,” J. Mach. Learn Res., vol. 12,
pp. 2825–2830, 2011.
R EFERENCES [18] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An
[1] A. Wang, G. Chen, J. Yang, S. Zhao, and C.-Y. Chang, “A comparative imperative style, high-performance deep learning library,” Adv. Neural
study on human activity recognition using inertial sensors in a smart- Inf. Process., vol. 32, 2019.
phone,” IEEE Sensors Journal, vol. 16, pp. 1–1, 06 2016.

You might also like