M Phil-13f21

Prediction of the Heatwave Damages Using Machine learning (ML)
Algorithms in Pakistan
Proposed by: Samiya Akhtar (13F21)
Summary:
Due to their implications on human health, extreme weather phenomena, such as heatwaves,
have received much attention. It is essential for minimizing the impact the disaster causes by
establishing rapid and preventive solutions to a disaster. This study will build a random forest
model (RFM) per week prognosis of heat damage based on 4 years (2017–2020) of
meteorological, statistical, and floating population data from Pakistan. We will analyze the
model by comparing the model's mean absolute error (MAE), root mean square error (RMSE),
root mean square logarithmic error (RMSLE) and coefficient of determination (R2) to those of
other conventional regression models.
Keywords: Heatwave damage, Prediction, Random Forest Model, Machine learning (ML),
Pakistan
Introduction:
Heatwave definition varies from region to region, but they are often dependent on location and
time and it is considered to arise when there is a large variation in the standard climatic trend.
The majority of the definitions are based on human health outcomes, with a threshold defined
based on human impacts. A heat wave in Pakistan occurs when the daily maximum temperature
exceeds above 45oC for at least five consecutive days.
One of the ensemble learning methods in machine learning (ML) is the random forest (RF)
method. The RF technique, a Bagging algorithm version, synthesizes the results of several
models and contains several decision trees in order to decrease the variance of the results from
diverse training sets based on the same distribution. The decision tree (DT) serves as the baseline
model for random forest (RF).
Compared to other classification algorithms, it is more accurate, automates missing values

presents in the data and works well with large data sets.
Background:
Schär, Vidale, Lüthi, et al. (2004), suggest a framework with higher temperature variability
would be able to explain the summer of 2003. To test this hypothesis, this runs a regional climate
model simulation of the potential future climate of Europe under a scenario with increased
atmospheric concentrations of greenhouse gases.
Khan, N., et.al. (2018) conducted a study in which a number of heat wave-related indices
consider both daily maximum and minimum temperatures to assess the changes in different
characteristics of heat waves in Pakistan, which is one of the most vulnerable countries in the
world to extreme temperatures.
Khan, N., et.al. (2019) propose a statistical model known as Quantile Regression Forests (QRF)
for the prediction of heat waves in Pakistan for different time lags using synoptic climate
variables.
Khan, N., Shahid, S., Ismail, T.B. et al. (2021) conducted a study in which they used artificial
neural networks, support vector machines (SVM), and random forests as machine learning (ML)
(ML) methods to forecast heat waves across Pakistan.
Rationale of Research:
In the literature, different studies conducted for heatwave prediction used different machine
algorithms in the context of climate change. However, we will predict heatwave damage in
Pakistan using random forest models.
Objective of Research:
 To develop a prediction model for heat-related health conditions on the premise of a machine
learning (ML) method for earlier detection tools.
 To facilitate decision-makers to act, preventing potential human and financial losses.
Methodology:
In this study, we will forecast the heatwave damage in Pakistan using the Princeton Global
Forcing (PGF) dataset (1948-2020) with a significantly better resolution (0.25 o * 0.25o). Gauge
data as well as NCEP/NCAR reanalysis data are used to construct the dataset.
To predict the damage caused by heat, relevant variables were chosen. Cardiovascular,
respiratory, and heatstroke are the most common diseases associated with heat. Based on
significant characteristics identified in past research, we selected the following variables:
temperature (Temp), humidity, and wind speed (WS), the number of occupational groups
susceptible (NOGS), insurance premiums per person (IPP), personal income per person (PIP),
and registered population of residents (RPR).
Steps for prediction model:
Step 1: Select the k different sample sets of data from the original data using the bootstrap
method.
Step 2: This algorithm will construct a decision tree for every training data.
Step 3: Voting will take place by averaging the decision tree.
Step 4: Finally, select the most voted prediction result as the final prediction result.
A loss function is used to calculate how closely a model's predicted values correspond to the real
values. The representative loss functions for detecting errors in regression models are mean
absolute error (MAE) and mean squared error (MSE).
MAE =
∑ ¿ yi−^y ∨¿
i=1
¿
N
N
MSE =
∑ ( yi−^y )2
i=1
N
MSE detects outliers more clearly than MAE. Therefore, MAE is used as a loss function in this
study. When evaluating regression models, the MAE, RMSE, RMSLE, and coefficient of
determination (R2), which are generally used to assess accuracy, are used to measure how well
the predicted values match the actual data.
After constructing the RF model, we estimate the number of patients who will have heart-related
illnesses introduced by heat waves. We’ll gather the socioeconomic, demographic and
meteorological data and use them as the model’s input variables. The RF model’s variables will
be filtered using the Boruta algorithm. Max depth and ntree are the two key parameters of the RF
method.
References:
Schär, C., Vidale, P., Lüthi, D. et al. (2004). The role of increasing temperature variability in European
summer heatwaves. Nature 427, 332–336. https://doi.org/10.1038/nature02300
Park, M., Jung, D., Lee, S., & Park, S. (2020). Heatwave Damage Prediction Using Random
Forest Model in Korea. Applied Sciences, 10(22), 8237. https://doi.org/10.3390/app10228237
Khan, N., Shahid, S., Ismail, T.B. et.al. (2021). Prediction of heat waves over Pakistan using
support vector machine algorithm in the context of climate change. 1335-1353.
https://doi.org/10.1007/s00477-020-01963-1
Khan, N., Shahid, S., Juneng, L., Ahmed, K., Ismail, T., & Nawaz, N. (2019). Prediction of heat
waves in Pakistan using quantile regression forests. Atmospheric Research, Volume 221, 2019.
Pages 1-11 https://doi.org/10.1016/j.atmosres.2019.01.024
Y. Liu and H. Wu, "Prediction of Road Traffic Congestion Based on Random Forest," 2017 10th
International Symposium on Computational Intelligence and Design (ISCID), 2017, pp. 361-364,
doi: 10.1109/ISCID.2017.216.
Z. Yao, X. Xu and H. Yu, "Floor Heating Customer Prediction Model Based on Random Forest,"
2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS),
2018, pp. 573-578, doi: 10.1109/ICIS.2018.8466420.
Khan, N., Shahid, S., Ismail, T., Ahmed, K., Nawaz, N., (2018). Trends in heat wave-related
indices in Pakistan. Stoch. Env. Res. Risk A. https://doi.org/10.1007/s00477-018- 1605-2

M Phil-13f21

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M Phil-13f21

Uploaded by

Copyright:

Available Formats

Prediction of the Heatwave Damages Using Machine learning (ML)

Proposed by: Samiya Akhtar (13F21)

Compared to other classification algorithms, it is more accurate, automates missing values

Steps for prediction model:

Step 3: Voting will take place by averaging the decision tree.

You might also like