You are on page 1of 11

Modelling Optimization for Business Decisions

Submitted by

Group 7, DIVISION: A

 Priyesh Nema E038


 Paridhi Bansal E039
 Poorvi Hattarki E040
 Yash Raj E042

Submitted To

Dr. Vengalarao Pachava

(Assistant Professor, School of Business Management, NMIMS


Hyderabad)

1
TABLE OF CONTENTS

Contents Page No.


Abstract 3
Introduction 3
Literature Review 3
Data Description 4
Methodology 4
Analysis 6
Results and Discussions 9
Conclusion 10
References 11

2
Abstract
This study uses different models to analyse a time series of data: 2D moving average, 3D moving
average, Double Exponential Smoothing, the autoregressive integrated moving average (ARIMA)
model using JMP and Python. The effectiveness of each model is assessed using in-sample forecasts
and accuracy metrics, including mean absolute percentage error(MAPE), mean squared error(MSE),
root mean squared error(RMSE) and Mean Absolute Deviation(MAD) . The dataset for temperature
recorded of Egypt is predicted using the model whose fitted values are most like the observed values.
This is determined by performing a residual analysis. The time series data used for the study was
initially found to be stationary. There was no need of transforming the data into stationary data using
diferencing before the models can be used for analysis and prediction.

Introduction
The temperature dataset for Egypt covering the period 2014-2018 becomes key information in the
field of environmental data analysis. The Egyptian landscape and climate patterns are markedly
varied, impacting agriculture, water resources, and socio-economic pursuits, as they result in
tremendous changes in temperatures. It encompasses yearly data on temperatures during five years
capturing seasonal trends, deviations, and indications for longer term climate changes.

It is essential for the local stakeholders as well as for international climate research because the fine
points on temperature dynamics in Egypt at this time are pivotal issues. In order to identify
appropriate adaptation and mitigation approaches to the problem of climate change, it is vital to
investigate relevant temperature datasets for particular locations like Egypt. This introduction
prepares ground for the consideration of complexity associated with the temperature dataset as well as
drawing significant conclusions on how climate patterns operate under Egyptian conditions.

Literature Review
Weather forecasting plays a crucial role in various aspects of daily life, influencing decisions ranging from outfit
choices to agricultural practices. In agriculture, where seasons and nature significantly impact outcomes,
accurate weather forecasts are essential for farmers. Previously reliant on estimates, farmers can now access
forecasts on their smartphones, thanks to technological advancements. This improves their ability to make
informed decisions about crop cultivation, leading to better yields and minimizing losses. Weather forecasting
also contributes to the transportation and storage of food grains, guiding cultural operations like harrowing and
hoeing, and facilitating livestock protection initiatives. The application of forecasting extends beyond immediate
decisions, aiding in long-term planning and climate change predictions. Understanding weather patterns helps
determine the likelihood of specific weather events, such as snow or hail, and assesses the thermal energy
exposure in a given region. Climatology, a branch of atmospheric sciences, delves into the scientific study of
climates, focusing on weather conditions over extended periods. It involves the analysis of variables and
averages of short-term and long-term weather conditions, providing valuable insights for environmental
activities. The development of efficient measures for environmental management is a key goal within
climatology, emphasizing the ongoing importance of accurate weather forecasting in diverse fields.

The systematic recording of weather data began in the 17th century with the advent of instruments for
measuring atmospheric conditions. Initially employed mainly in agriculture for better planning of planting and
harvesting, weather records gained significance over time. The foundations of the national weather services in
the United States were laid by Joseph Henry in 1849, establishing a network of volunteer weather observers.
The U.S. Army Signal Corps initiated the first national weather services in 1870, incorporating Henry's
volunteer observers by 1874, and later, the Department of Agriculture took over these operations in 1891.

Weather forecasting gained prominence in aviation during the 1920s and '30s, particularly under the leadership
of Francis W. Reichelderfer, chief of the U.S. Weather Bureau. World War II heightened the importance of

3
weather forecasting, with specific attention to challenges like the jet streams affecting aircraft speed. Notably,
weather forecasting played a crucial role in planning Operation Overlord, the D-Day invasion at Normandy in
1944. The latter half of the 20th century witnessed the reorganization of the U.S. weather bureau, leading to the
establishment of the National Weather Service in 1970. Simultaneously, the commercial weather-forecasting
sector experienced unprecedented growth, providing services to industries such as marketing, shipping, aviation,
and international trade of commodities. Weather forecasts became essential for industries to optimize sales, plan
shipping routes, and anticipate potential impacts on agriculture.

The economic implications of weather on commodities and services, such as agriculture and outdoor events,
underscored the importance of accurate weather information. International trading of foodstuffs, like wheat and
coffee, could be significantly affected by sudden weather changes, prompting organizations to seek advance
knowledge from weather-forecasting entities. Precise forecasts became essential for specific industries, such as
gas and electric utilities requiring temperature predictions within a few degrees or ski-resort operators needing
humidity forecasts for snowmaking.

In summary, the evolution of weather forecasting from its agricultural origins to its multifaceted applications
across various industries highlights its integral role in decision-making and planning in modern society.

Data Description
The dataset consists of yearly observations from the year 2014-2018 of Temprature in Egypt, source
for the dataset was GITHUB.

Methodology
The study starts with illustrating a univariate technique to demonstrate the time series component.
Figure 1 provides a time series plot of the variable. As shown in Fig. 1, the plot appears to be non-
stationary. However, it is vital to examine whether the data are nonstationary or not using appropriate
statistical techniques. The Dickey-Fuller test, which statisticians David Dickey and Wayne Fuller
created in 1979, has been expanded upon with the creation of the Augmented Dickey–Fuller test
(ADF test). It is used to determine whether a particular time series data are stationary at the unit root
or not. The null (H0) and alternative (H1) hypotheses are: H0: The data is non-stationary at the unit
root. H1: The data is stationary. As per the hypothesis, to establish non-stationarity, a p-value greater
than 0.05 needs to be obtained. The estimated p-value of 8.675937480199653e-09 does exceeds the
signifcance level cut of of α=0.05; the null hypothesis H0 can be ruled out. This means that the data
are stationary at this stage. Hence, the diference method is not employed to make the data stationary
to eliminate the trend in the absence of seasonality.

4
In the next phase of the analysis, the prediction of the Avg. temp of Egypt is obtained by considering
the traditional statistical methods like Holt’s Exponential Smoothing and Autoregressive Integrated
Moving Average (ARIMA).
Holt’s Exponential Smoothing helps in the prediction of data that has a trend component. This method
uses exponentially weighted moving averages to smooth the values in the time series data. This allows
for better forecasting of the target value. It is also referred to as double exponential smoothing as it
has two parameters: level and trend. From the graph attached below, the trend component is present in
this data; thus, we perform an in-sample forecast for Holt exponential smoothing.

AvgTemp
60

50

40

30

20

10

0
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136

ARIMA is another statistical method that can predict the target variable based on the given time series
data. It has three components, namely Autoregression (AR), Integrated (I) and Moving Average (MA).
Autoregression forecasts by regressing the variable on itself, i.e., it assumes that the future values are
correlated to past values in the time series. The Integrated concept involves differencing the time
series data till it becomes stationary. The number of times differencing is done on the data is known as

5
the degree of differencing. The MA model, unlike the AR model, which uses past values of the
variable, uses errors from past forecasts to predict future values. The ARIMA model can be expressed
as (p, d, q) where p is the number of autoregressive terms, d is the degree of differencing, and q is the
number of forecast errors.

Here the moving average parameters (θ’s) are defined so that their signs are negative in the equation, following
the convention introduced by Box and Jenkins. Often the parameters are denoted there by AR(1), AR(2), …,
and MA(1), MA(2), … etc..

To identify the appropriate ARIMA model for Y, we begin by determining the order of differencing (d) needing
to stationaries the series and remove the gross features of seasonality, perhaps in conjunction with a variance-
stabilizing transformation such as logging or deflating. If we stop at this point and predict that the differenced
series is constant, we have merely fitted a random walk or random trend model. However, the stationary series
may still have autocorrelated errors, suggesting that some number of AR terms (p ≥ 1) and/or some number MA
terms (q ≥ 1) are also needed in the forecasting equation.

The process of determining the values of p, d, and q that are best for a given time series will be discussed in later
sections of the notes (whose links are at the top of this page), but a preview of some of the types
of nonseasonal ARIMA models that are commonly encountered is given below.

The accuracy measures are obtained based on the above discussed models and in-sample forecast. The accuracy
measures used in this study are Root mean Square Error (RMSE), Mean Square Error (MSE), and Mean
Absolute Percentage Error (MAPE) and Mean Absolute Deviation (MAD). Next, the best model is chosen based
on these accuracy metrics and model assumptions. The Avg. Temp. in Egypt during the next five years is
forecast using the most accurate model.

Analysis
The traditional statistical models—Holt’s Exponential Smoothing and ARIMA are used for predictions.
The two models permit the forecasting of a variable that is in a time series.

Then, the Augmented Dickey–Fuller Test (ADF) test is applied to determine whether the time series
data are stationary. Since the calculated p-value is 8.675937480199653e-09, which is less than the
threshold significance level α=0.05, the null hypothesis (H0) can be rejected. Thus, the data are
found to be stationary.

This can also be seen in Fig. where the first Autocorrelation Function (ACF) plot shows the correlation
of the variable with a lagged version of itself. Lag refers to the values in the data but at an earlier
time.

6
The Partial Autocorrelation Function (PACF) plot depicts the partial correlation between time series
data and its lagged version. Partial correlation is the correlation between two variables when the
influence of other predictor variables is eliminated from the relationship. This ensures that the
correlation is not spurious as the effect of variance of other variables is removed.

The residual analysis is performed to assess the adequacy of the fitted model’s Residual terms refer
to the difference between the observed and the fitted or predicted values. The main two
assumptions carried by the residual analysis are:

• Residuals are uncorrelated.


• Residuals are normally distributed.

If the correlation between the residuals is not significant, the model is said to be a good ft. The p-
value is calculated, and if it is more significant than 0.05, the residuals are said to be uncorrelated.
The hypothesis is: H0: residuals are uncorrelated. H1: residuals are correlated. We also check for
normality, i.e., whether the residuals are normally distributed or not, as it is one of the assumptions
of a linear model. The hypothesis is: H0: residuals are normally distributed. H1: residuals are not
normally distributed. The null hypothesis is rejected if the p-value is less than the significance level of
0.05. In this study, the Box–Ljung test is employed for residual analysis and the Shapiro–Wilk
normality test is used for normality. First, Holt’s exponential smoothing model is applied to predict
the area under Avg. Temp of Egypt for the years 2014 to 2018. In Fig. the fitted values and observed
values are plotted in the same graph. When testing for normality, the residuals are distributed
normally, and the alternative hypothesis is that they are not distributed normally.

7
The results for Box-Ljung test and Shapiro-wilk test are as follows:

8
The p-value for the Shapiro–Wilk normality test is calculated to be 0.0017. As this value is less
significant than 0.05, the null hypothesis is rejected.

The p-value calculated for the Box–Ljung test is 0.7163. As it is more significant than 0.05, the
residuals are uncorrelated.

Result and Discussion


From a univariate analysis of the time series data for calculating the Avg. temp for the next 60 days,
we performed the Holt’s Exponential Smoothing(Double), 2 Days Moving Average, 3 Days Moving
Average, Weighted Moving Average and ARIMA. The errors of the model for each of these methods
are as follows:

2 DAYS MOVING AVERAGE


15.14838
MSE 2
3.892092
RMSE 2
2.404827
MAD 2
MAPE 4.99%

3 DAYS MOVING AVERAGE


15.71319
MSE 7
3.963987
RMSE 5
2.581961
MAD 2
MAPE 5.38%

WEIGHTED MOVING AVERAGE


MSE 14.68351

9
1
3.831906
RMSE 9
2.326165
MAD 7
MAPE 4.82%

DOUBLE EXPONENTIAL SMOOTHING


6.86830
MSE 4
2.62074
RMSE 5
1.72519
MAD 9
MAPE 3.59%

ARIMA MODEL

As we can clearly see the MSE, RMSE, MAPE values for Holt’s Double Exponential Smoothing is
the best method for forecasting the future values. The next best model will be ARIMA which has the
second least error values.

Conclusion
In this study, various time series models were employed to analyze temperature data for Egypt spanning the
years 2014-2018. The models considered include 2D moving average, 3D moving average, Double Exponential
Smoothing, and the autoregressive integrated moving average (ARIMA) model. The effectiveness of each
model was assessed using in-sample forecasts and accuracy metrics such as mean absolute percentage error
(MAPE), mean squared error (MSE), root mean squared error (RMSE), and Mean Absolute Deviation (MAD).
The dataset, obtained from GitHub, provided crucial information for environmental data analysis, particularly in
understanding the complex temperature dynamics in Egypt. The diverse landscape and climate patterns in Egypt
impact various sectors such as agriculture, water resources, and socio-economic activities. Analysing
temperature data is essential for both local stakeholders and international climate research.

The study began by examining the non-stationarity of the time series data using the Augmented Dickey–Fuller
test, which confirmed the data's stationarity. This eliminated the need for differencing to make the data
stationary. Subsequently, traditional statistical models, Holt’s Exponential Smoothing and ARIMA, were applied
for forecasting the Avg. Temp of Egypt.

10
Holt’s Exponential Smoothing, suitable for data with a trend component, and ARIMA, incorporating
autoregressive, integrated, and moving average components, were used to predict the target variable. Accuracy
measures, including RMSE, MSE, MAPE, and MAD, were employed to evaluate the models. Residual analysis
was conducted to assess the adequacy of the fitted models, considering the assumptions of uncorrelated and
normally distributed residuals. The results indicated that Holt’s Double Exponential Smoothing outperformed
the other models in terms of forecasting accuracy, with the lowest error values. The ARIMA model also
demonstrated good performance but had slightly higher error values compared to Holt’s Exponential Smoothing.

In conclusion, based on the analysis and evaluation of various models, Holt’s Double Exponential Smoothing is
recommended as the most accurate model for forecasting temperature in Egypt. The findings contribute valuable
insights for stakeholders and climate researchers in understanding and addressing climate patterns in the region.

References
https://www.vedantu.com/geography/weather-forecasting
https://www.britannica.com/science/weather-forecasting/Numerical-weather-
prediction-NWP-models

11

You might also like