Báo KH

1
Intrapatient Forecasting of Parkinson’s

Wearing-off by Analyzing Data from
Wrist-worn Fitness Tracker and Smartphone
Nhat Tan Le* 1 , Khuong Cong Duy Nguyen* 2 , Nhat Duy Vo 3 , Tan Thi
Pham 4
1234
Ho Chi Minh City University of Technology (HCMUT), 268 Ly Thuong
Kiet, Ward 14, District 10, Ho Chi Minh City, Vietnam,
1234
Vietnam National University Ho Chi Minh City (VNUHCM), Quarter 6,
Linh Trung Ward, Thu Duc City, Ho Chi Minh City, Vietnam
∗
Authors equally contribute to this work
Abstract
Wearing-off forecasting plays an important role in Parkinson’s symptom moni-
toring, which could reduce the risk due to the expiration of drug effectiveness.
Taking advantage of the development of technology, this detrimental phe-
nomenon could be accurately and remotely forecasted by analyzing data from
wearable sensors with machine learning techniques. In the ABC Challenge
2023, the dataset of 10 Parkinson’s patients, which was collected by a wear-
able fitness device, is utilized for the wearing-off forecasting application. In
this work, the KYSAI team has proposed a processing pipeline for the heart
rate, stress, and step dataset for the forecasting purpose. A total of 31 features
are extracted, which include several time windows before the forecasting pe-
riod. XGBoost model is utilized to perform the intra-person forecasting with
the focal loss to handle the imbalance class dataset issue. The remarkable
performance presented, the overall average weighted F1-score, Precision, and
1 lenhattan@hcmut.edu.vn
2 duy.nguyenkhuongcong@gmail.com
3 duy.vonhatduy@hcmut.edu.vn
4 ptthi@hcmut.edu.vn
1
2
recall are 0.94, 0.94, and 0.96 respectively. However, further analysis is needed
to address this highly imbalanced data problem.
1 Introduction
Parkinson’s disease (PD) is a neurodegenerative disorder characterized by the
loss of dopamine-producing brain cells [7], which significantly impacts the
quality of a patient’s life. Specifically, PD affects the patient’s motor abilities,
leading to symptoms such as tremors, muscle stiffness, and difficulties with
movement and balance [18]. Additionally, PD can cause non-motor symptoms
such as disturbances in sleep, constipation, and urinary [18]. Therefore, early
diagnosis and appropriate treatment plans for Parkinson’s disease are cru-
cial for effective management and improved patient outcomes [10]. However,
conventional methods, such as clinical assessments and neuro-imaging, often
rely on subjective observations and are not accessible to all individuals. This
poses a challenge in timely diagnosis and treatment initiation for PD patients,
delaying the potential benefits of early intervention [7].
Wearing-off phenomenon management plays an important role in PD
symptom controlling and improving the patient’s overall well-being. This phe-
nomenon significantly impacts their daily activities, where the effectiveness of
the medication gradually diminishes over time, and the patient’s symptoms
reappear before the next dose of medication [5]. The wearing-off phenomenon
is associated with the fluctuation of motor and non-motor symptoms, includ-
ing tremors, rigidity, bradykinesia, and fluctuations in mood and cognition [5].
Understanding the characteristics of PD medication, including the wearing-off
phenomenon, is essential for optimizing treatment strategies and improving
patient outcomes.
By leveraging advanced technologies, such as Artificial Intelligence (AI)
and the Internet of Things, wearing-off periods would possibly be effectively
monitored and accurately forecasted. By analyzing wearable sensor data vari-
ables [20, 1, 4, 15, 12] (such as heart activity, mobility, physical exercise, sleep
patterns, gait, and other relevant factors), questionnaires [16] or clinical mea-
surements [14] (such as medical imaging, drug intake), AI-powered were devel-
oped to anticipate the occurrence of wearing off in Parkinson’s disease patients
and had achieved many considerable results [20, 1, 3]. This approach offers
the potential to enhance the understanding of Parkinson’s disease treatment
and support clinicians in optimizing patient care strategies such as tailoring
medication regimens and providing timely interventions.
In the ABC Challenge 2023, the dataset [19] from a wrist-worn device is
utilized to forecast the wearing-off period in the next 15 minutes. In this paper,
team KYSAI aims to explain the proposed pipeline including pre-processing
method (Section 3.2), the feature extraction and analysis process based on
3
tracker data (Section 3.3 and 3.6) and model training (Section 3.4). The model
presents positive results in forecasting performance (Section 4.1). Moreover,
features insight has been shown in Section 3.6)
2 Background and Related Works

The wearing-off forecasting and recognizing methods are developing from man-
ual technique (questionnaire [16]) to remote and automatic technique (by
wearable sensors [20, 3, 8]. In Stacy et al work [17], the Wearing-off Question-
naire 9 questions (Wo9) [2] presented a performance of 95.8% in wearing off
identifying. However, an automatic and flexible technique is needed for better
detection, moreover, forecast of this period in the time afterward. Currently,
wearable sensors are widely applied to PD diagnosis, motion monitoring, and
off-stage status prediction [8]. Several works utilized accelerometer and gyro-
scope sensors to recognize the freezing of gait [4, 3], tremor [12], bradykinesia
[15] and achieved considerable performance. Heart rate and blood pressure
factors also reveal robust information about the wearing-off phenomenon [13],
in which a higher value of these factors correlated with the appearance of
wearing-off [13].
Based on data from wearable sensors, machine learning currently performs
a considerable performance on PD symptoms and wearing-off status recogni-
tion. In Satyabrata et al work [1], the accuracy of “On”/“Off” state detection
was 96.72% in the Random forest model. Julia Camps et al [3] work proposed
a deep learning model on waist-worn sensors to detect freezing of gait symp-
toms with the F1-score of 90%. Victorino et al [20] utilized the data from a
fitness tracker to predict wearing-off status and achieved balanced accuracy
of 70 to 77%. Therefore, machine learning applications on wearable sensor
data could provide remarkable performance on the wearing-off phenomenon
forecasting.
3 Methodology
The processing pipeline includes 5 main stages (Figure 1). Firstly, raw data is
cleaned during pre-processing 1 stage (Section 3.2). After cleaning, the data
are re-sampled to a specific time window, and a total of 31 features are ex-
tracted through this process (Section 3.3). After matching data from several
sources, a secondary pre-processing stage is proposed to handle additional
missing values (Section 3.2). Afterward, an ensemble learning model is uti-
lized with an additional technique to solve the imbalance-class data challenge
4
(Section 3.4). Finally, several evaluation metrics are applied to check the val-
idation performance (Section 3.5). Moreover, an feature analysis has been
conducted and presented in Section 3.6.
3.1 Dataset
The dataset utilized in this challenge was collected from the Garmin viosmart4
fitness tracker [20], including heart rate, steps, stress score, and sleep pattern.
In this work, 3 sources of data (heart rate, stress, steps) were analyzed to
perform the wearing-off prediction.
The wearing-off period was recorded by using Fonlog [9], a smartphone ap-
plication for data collection. The wearing-off period is annotated by Wearing-
Off Questionnaire (WoQ-9) using the Japanese translation [6]. In which, the
wearing-off phenomenon occurs when at least 1 symptom appears (pain,
tremors, anxiety, rigidity, slow down, slow thoughts, impaired hands, mood
change, muscle spasm)
3.2 Pre-processing
The main task in the 2 pre-processing stages is missing data handling.
In the raw dataset, missing data occurs as a short blank sequence. For the
heart rate and stress data, the nearest values are utilized to fill in the missing
series. For the step data, zero values are filled.
In the second pre-processing stage, after re-sampling and merging data
from several sources. Missing data occurs as a longer blank sequence due to
the device dropping out. The median value is filled into heart rate and stress
data, while the zero value is filled into step data.
3.3 Data Re-sampling and Feature Extraction

Due to the difference in sampling rate, a re-sampling step is needed to combine
all data sources. The 15-minute window is employed in this work. Through
the re-sampling step, features are extracted by calculating several statistical
factors (mean, standard deviation, maximum and minimum). A total of 31
features were extracted through the re-sampling process (Table 1.1). These
statistical features could reveal robust information about the fluctuating state
of a Parkinson’s patient, especially in stress score and heart rate, which could
reveal abnormal processes related to the wearing-off phenomenon [11].
3.4 Model Training

To prepare for training, the processed dataset is individually randomly sepa-
rated into Train and Validation folds with a ratio of 8:2 randomly. Afterward,
the XGBoost model has been utilized with the focal loss [21] to train on each of
the participant’s fold. This model is a distributed gradient-boosted decision
5
FIGURE 1: Data processing pipeline.
tree (GBDT) machine learning module that provides parallel tree boosting
and is the leading machine learning library for regression, classification, and
ranking problems. Along with the implementations of Weighted Loss and Fo-
cal Loss to address the problem of label-imbalanced data.
Additionally, to achieve the optimal result, a hyper-parameter tuning tech-
nique, Grid-search Cross Validation, has been applied in this work. Specifi-
cally, the tuning hyper-parameters are focal gamma and imbalance alpha.
3.5 Evaluation
After the model training stage, the evaluation process is conducted on the
validation split of 10 participants. Due to the huge imbalance in the quantity
of data among classes, 3 weighted evaluation metrics (Precision, Recall, and
F1-score) were performed in this work.
3.6 Feature Analysis

The Analysis of Variance (ANOVA) F-test and Gini importance are utilized
to evaluate the performance of extracted features. In ANOVA F-test, the tech-
nique relying on a statistical test, a higher importance value (represented by
a higher F-statistic or F-score) describes a higher indicates a higher difference
between the groups or categories defined by that particular feature. On the
other hand, the Gini importance counts the times a feature is used to split a
node in the Random Forest model.
4 Results and Analysis

The classification performance and feature analysis results are analyzed in
Section 4.1 and 4.2 correspondingly.
6
TABLE 1.1: Table of extracted features. In which, the time windows t1, t2,
and t3 are from -45 minutes to -30 minutes, from -30 minutes to -15
minutes, and from -15 minutes to present respectively.
Data Number of features Features

Heart rate 12 Mean (t1, t2, and t3)
Standard Deviation (t1, t2, and t3)
Maximum value (t1, t2, and t3)
Minimum value (t1, t2, and t3)
Stress score 12 Mean (t1, t2, and t3)
Standard Deviation (t1, t2, and t3)
Maximum value (t1, t2, and t3)
Minimum value (t1, t2, and t3)
Steps 3 Mean (t1, t2, and t3)
Date & time 4 Timestamp hour (hour, hour sine, hour
cosine)
Timestamp day of week
Total 31
4.1 Classification Performance

Considerable performance of intra-person wearing-off forecasting has been
achieved in this work (Table 1.2). In 10 participants, the overall weighted
F1 score, Precision, and recall are 0.94, 0.96, and 0.94 respectively. The high-
est evaluation metrics are presented in Participants 2 and 7, which are 1.00 of
all 3 metrics. The lowest performance is the model in Participant 4 with the
weighted F1 score, Precision, and recall are 0.87, 0.84, and 0.90 respectively.
However, these results just partially evaluate the actual performance of
the model due to the lack of samples for each participant.
TABLE 1.2: Table of Focal Loss applied XGBoost Classification

Performance in separated Participants
Participants Precision Recall F1-Score

Participants 1 0.86 0.911 0.88
Avg 0.94 0.96 0.94
7
FIGURE 2: Feature Importance score by ANOVA F-test technique
4.2 Feature Analysis

According to the feature analysis results on ANOVA F-test (Figure 2) and
Random Forest models (Figure 3), features from stress, heart rate and times-
tamp data present a high importance value. Especially, the encoding cyclical
hour features have the highest importance value in both models. The stress-
data-based and heart-rate-data-based features show a high and nearly equal
importance value. Otherwise, step features perform poorly in the wearing-off
forecasting task.
In heart rate features, the heart rate max features in 3 different time
recordings present the highest importance value in the ANOVA F-test model,
while the heart rate standard deviation reaches the highest in the Gini im-
portance model. Similar to heart rate features, the stress score max features
show the highest importance in the ANOVA F-test model, while in the Gini
importance model, all stress-score-based features have equal performance.
5 Discussion
The model performance through evaluation metrics presents optimistic re-
sults. Therefore, the effectiveness of the processing pipeline has been proved.
However, due to a high imbalance of wearing-off labels, cause the amount of
the sequence labeled ”wearing-off” is extremely few in the validation split.
8
FIGURE 3: Feature Importance score by Gini importance
Therefore, despite the model producing high performance with the average
weighted F1-score of 0.94, several factors are needed to be noted:
1. Insufficient data for intra-person forecasting: The current training
dataset is not nearly enough for most models to learn since it usu-
ally takes thousands or more samples. Specifically in our case after
resampling, there are only 865 segments of training data for par-
ticipants 8 and 9, and only 288 segments of training data for par-
ticipants 7. Which in both case are not enough for data to train or
generalized efficiently.
2. Imbalance class dataset: After assigning wearing-off status labels to
processed data, it is clear that the training set is heavily skewered
towards predicting control status due to wearing-off only happen-
ing on several occasions. This caused the model to tend to predict
”normal” in most cases. Hence despite even not wearing off exist,
the model can still have a high level of accuracy in the extreme case
of Participants 2 and 7 due to having no ”wearing off” labels, which
has an accuracy of 100%.
The inter-person forecasting model could be considered in further work.
However, there are several challenges due to the difference in bio-signal and be-
havior among patients. More specifically, some parameters such as heart rate,
and mental health status depend a lot on individual factors such as physique,
demographics, work, and living conditions. However, the performance of the
inter-person forecasting model could achieve considerable results if the dataset
is suitable and large amounts.
9
The feature analysis results provide insight into wearing-off characteristics

in each patient. First of all, the timestamp-based feature present a high im-
portance value, which proved the preferred time to appear wearing off of each
patient. Secondly, the heart rate features, especially the max and standard
deviation values present high-importance values. Unlike the mean value, the
maximum value describes the highest heart rate in 15 minutes of the time
segment window, and the standard deviation presents the instability of the
heart rate. The higher maximum and higher standard deviation values reveal
the abnormal symptom in Parkinson’s patients. Similar to Ruonala et al work
(2015), the heart rate variability had been demonstrated that the fluctuating
heart rate is correlated with the Levodopa drug effect. Stress score is also
highly correlated with the wearing-off phenomenon. Specifically, this factor
possibly describes the uncomfortable feeling of the patient, which could corre-
spond to some of Parkinson’s symptoms. Several works [3, 4] have proved that
mobility factors could be correlated with Parkinson’s symptoms such as freez-
ing of gait or tremor. However, for the step features in this work, the mean
step presents a minor correlation to the wearing-off period. In conclusion, the
forecasting performance could be improved by a deeper analysis of heart rate
and stress status, in both data collection and processing steps.
6 Conclusion
In this paper, a processing pipeline has been demonstrated for Parkinson’s
disease wearing-off forecasting purposes by analyzing a dataset from a fitness
tracker. 31 features are extracted from 3 different time windows: from -45
minutes to -30 minutes, from -30 minutes to -15 minutes, and from -15 minutes
to present. The XGBoost model with focal loss technique has been utilized
and presented a considerable performance on intra-patient forecasting tasks
on the imbalance-class dataset. The feature analysis results show that high
importance value in the extracted features, especially in timestamp-based,
heart-rate-based, and stress-score-based features. Although there are several
challenges including highly imbalanced classes are still considered.
10
Bibliography
[1] Aich, S., Youn, J., Chakraborty, S., Pradhan, P.M., Park, J.B., Park,
S.H.: A supervised machine learning approach to detect the on/off
state in parkinson’s disease using wearable based gait signals. Diag-
nostics 10(6), 421 (2020). DOI 10.3390/diagnostics10060421. URL
https://doi.org/10.3390/diagnostics10060421
[2] Antonini, A., Martinez-Martin, P., Chaudhuri, R.K., Merello, M., Hauser,
R.A., Katzenschlager, R., Odin, P., Stacy, M., Stocchi, F., Poewe,
W., Rascol, O., Sampaio, C., Schrag, A., Stebbins, G.T., Goetz, C.G.:
Wearing-off scales in parkinson’s disease: Critique and recommendations.
Movement Disorders 26(12), 2169–2175 (2011). DOI 10.1002/mds.23875.
URL https://doi.org/10.1002/mds.23875
[3] Camps, J., Sam, A., Martn, M., Rodrguez-Martn, D., Prez-Lpez, C.,
Arostegui, J.M.M., Cabestany, J., Catal, A., Alcaine, S., Mestre, B.,
Prats, A., Crespo-Maraver, M.C., Counihan, T.J., Browne, P.J., Quin-
lan, L.R., Laighin, G., Sweeney, D., Lewy, H., Vainstein, G., Costa, A.C.,
Annicchiarico, R., Bays, N., Rodrguez-Molinero, A.: Deep learning for
freezing of gait detection in parkinson’s disease patients in their homes
using a waist-worn inertial measurement unit. Knowledge Based Sys-
tems 139, 119–131 (2018). DOI 10.1016/j.knosys.2017.10.017. URL
https://doi.org/10.1016/j.knosys.2017.10.017
[4] De Lima, A.R., Evers, L.J., Hahn, T., Bataille, L., Hamilton, J.L., Little,
M.A., Okuma, Y., Bloem, B.R., Faber, M.J.: Freezing of gait and fall de-
tection in parkinson’s disease using wearable sensors: a systematic review.
Journal of Neurology 264(8), 1642–1654 (2017). DOI 10.1007/s00415-
017-8424-0. URL https://doi.org/10.1007/s00415-017-8424-0
[5] DeMaagd, G., Philip, A.: Parkinson’s disease and its management: Part
4: Treatment of motor complications. PubMed 40(11), 747–73 (2015).
URL https://pubmed.ncbi.nlm.nih.gov/26609209
[6] Fukae, J., Higuchi, M.A., Yanamoto, S., Fukuhara, K., Tsugawa,
J., Ouma, S., Hatano, T., Yoritaka, A., Okuma, Y., Kashihara, K.,
Hattori, N., Tsuboi, Y.: Utility of the japanese version of the 9-
item wearing-off questionnaire. Clinical Neurology and Neurosurgery
134, 110–115 (2015). DOI 10.1016/j.clineuro.2015.04.021. URL
https://doi.org/10.1016/j.clineuro.2015.04.021
[7] Kalia, L.V., Lang, A.E.: Parkinson’s disease. The Lancet 386(9996),
896–912 (2015). DOI 10.1016/s0140-6736(14)61393-3. URL
https://doi.org/10.1016/s0140-6736(14)61393-3
11
[8] Lu, R., Xu, Y., Zhang, H., Fan, Y., Zeng, W., Tan, Y., Ren, K.F.,
Chen, W., Cao, X.: Evaluation of wearable sensor devices in parkin-
son’s disease: A review of current status and future prospects. Parkin-
son’s Disease 2020, 1–8 (2020). DOI 10.1155/2020/4693019. URL
https://doi.org/10.1155/2020/4693019
[9] Mairittha, N., Mairittha, T., Inoue, S.: A mobile app for nursing
activity recognition (2018). DOI 10.1145/3267305.3267633. URL
https://doi.org/10.1145/3267305.3267633
[10] Murman, D.L.: Early treatment of parkinson’s disease: opportunities
for managed care. PubMed 18(7 Suppl), S183–8 (2012). URL
https://pubmed.ncbi.nlm.nih.gov/23039867
[11] Pursiainen, V., Korpelainen, J.T., Haapaniemi, T.H., Sotaniemi, K.A.,
Myllylä, V.V.: Blood pressure and heart rate in parkinsonian pa-
tients with and without wearing-off. European Journal of Neurology
14(4), 373–378 (2007). DOI 10.1111/j.1468-1331.2007.01672.x. URL
https://doi.org/10.1111/j.1468-1331.2007.01672.x
[12] Rigas, G., Tzallas, A.T., Tsipouras, M.G., Bougia, P., Tripoliti,
E.E., Baga, D., Fotiadis, D.I., Tsouli, S., Konitsiotis, S.: Assess-
ment of tremor activity in the parkinson’s disease using a set of
wearable sensors. IEEE Transactions on Information Technology in
Biomedicine 16(3), 478–487 (2012). DOI 10.1109/titb.2011.2182616.
URL https://doi.org/10.1109/titb.2011.2182616
[13] Ruonala, V., Tarvainen, M.P., Karjalainen, P.P., Pekkonen, E., Rissanen,
S.M.: Autonomic nervous system response to L-dopa in patients with
advanced Parkinson’s disease (2015). DOI 10.1109/embc.2015.7319799.
URL https://doi.org/10.1109/embc.2015.7319799
[14] Salmanpour, M.R., Shamsaei, M., Saberi, A., Setayeshi, S.,
Klyuzhin, I.S., Sossi, V., Rahmim, A.: Optimized machine learn-
ing methods for prediction of cognitive outcome in parkin-
son’s disease. Computers in Biology and Medicine 111,
103347 (2019). DOI 10.1016/j.compbiomed.2019.103347. URL
https://doi.org/10.1016/j.compbiomed.2019.103347
[15] Shima, K., Tsuji, T., Kan, E., Kandori, A., Yokoe, M., Sakoda,
S.: Measurement and evaluation of finger tapping movements using
magnetic sensors (2008). DOI 10.1109/iembs.2008.4650490. URL
https://doi.org/10.1109/iembs.2008.4650490
[16] Stacy, M.: The wearing-off phenomenon and the use of questionnaires
to facilitate its recognition in parkinson’s disease. Journal of Neural
Transmission 117(7), 837–846 (2010). DOI 10.1007/s00702-010-0424-5.
URL https://doi.org/10.1007/s00702-010-0424-5
12
[17] Stacy, M., Hauser, R.A., Oertel, W.H., Schapira, A.H., Sethi,
K.D., Stocchi, F., Tolosa, E.: End-of-dose wearing off in parkin-
son disease. Clinical Neuropharmacology 29(6), 312–321
(2006). DOI 10.1097/01.wnf.0000232277.68501.08. URL
https://doi.org/10.1097/01.wnf.0000232277.68501.08
[18] Sveinbjörnsdóttir, S.: The clinical symptoms of parkinson’s disease. Jour-
nal of Neurochemistry 139, 318–324 (2016). DOI 10.1111/jnc.13691.
URL https://doi.org/10.1111/jnc.13691
[19] Victorino, J.N.C., Kaneko, H., Fikry, M., Nahid, N., Hosain, T.,
Shibata, T., Inoue, S.: 5th abc challenge: Forecasting parkinson’s
disease patients’ wearing-off phenomenon datasets (2023). URL
https://ieee-dataport.org/competitions/5th-abc-challenge-forecasting-
parkinsons-disease-patients-wearing-phenomenon-datasets
[20] Victorino, J.N.C., Shibata, Y., Inoue, S., Shibata, T.: Predicting wearing-
off of parkinson’s disease patients using a wrist-worn fitness tracker and
a smartphone: a case study. Applied Sciences 11(16), 7354 (2021). DOI
10.3390/app11167354. URL https://doi.org/10.3390/app11167354
[21] Wang, C., Deng, C., Wang, S.: Imbalance-xgboost: Leveraging weighted
and focal losses for binary label-imbalanced classification with xgboost
(2019)
7 Appendix
Programming Language and Python (Pandas, Numpy,

Libraries Sklearn, matplotlib, imxgboost)
Training time 6,6 s (average per participant)
Hardware Google Colabotary (Intel(R)
Xeon(R) CPU @ 2.20GHz,
RAM 12GB)

Báo KH

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Báo KH

Uploaded by

Copyright:

Available Formats

1

Intrapatient Forecasting of Parkinson’s

2 Background and Related Works

3.3 Data Re-sampling and Feature Extraction

3.4 Model Training

FIGURE 1: Data processing pipeline.

3.6 Feature Analysis

4 Results and Analysis

Data Number of features Features

4.1 Classification Performance

TABLE 1.2: Table of Focal Loss applied XGBoost Classification

Participants Precision Recall F1-Score

FIGURE 2: Feature Importance score by ANOVA F-test technique

4.2 Feature Analysis

FIGURE 3: Feature Importance score by Gini importance

The feature analysis results provide insight into wearing-off characteristics

Programming Language and Python (Pandas, Numpy,

You might also like