You are on page 1of 53

Demand Analysis

07.06.2019

Janani Prakash
PGPBABI-Online
GreatLearning, Great Lakes Institute of Management
Project Objective 3

Exploratory Data Analysis – Step by step approach 4


Install necessary Packages and Invoke Libraries 4
Set up working Directory 4
Import and Read the Dataset 4

Explanation and conclusion 5


Exploring the dataset 5
Conversion of the dataset into time series 6
Decomposition of the time series dataset 8
Decomposing the time series using STL 9
Divide data into test and train 14
Forecasting using random method 15
Forecast on train data of Item A 15
Forecast on train data of Item B 16
Forecasting using Holt Winters 18
Forecast using ARIMA model 24
Check for stationary time series 24
ACF and PACF (performing to check the stationary data and autocorrelation) 26
Model Comparison 42
Further forecasting using ARIMA model and conclusion 43

Appendix A – Source Code 46


1. Project Objective
The dataset contains monthly demand of two different types of consumable items in a certain
store from January 2002 to September 2017. The ultimate objective of this project is to
predict sales for the period October 2017 to December 2018.

1. The data should be read as time series objects in R and plotted. The major features
in the series should be identified and the difference between the two series should be
drawn.
2. Any seasonal changes in the two series should be scanned and then a formal
extraction of time series components should be done. The two series have to be
compared to check whether there are more variability in a season compared to the
others, whether seasonal variations are changing across years etc.
3. Each series should be decomposed to extract trend and seasonality, if there are any
and the best seasonality type has to be identified – additive or multiplicative. The
seasonal indices should be explained. The months in which the sales are higher or
lower should be identified. Any difference in the nature of demand of the two items
have to be identified.
4. The residuals for the two decomposition exercises have to be extracted and check if
it forms a stationary series. A formal test for stationary has to be done writing down
the null and alternative hypothesis and the conclusion has to be drawn for each case.
5. Before the final forecast is undertaken a few models have to be compared. The last
21 months should be used as hold-out sample to fit a suitable exponential smoothing
model to the rest of the data and the MAPE has to be calculated. The values of a, ß
and ​γ have to be calculated. For the same hold-out period the forecast by
decomposition has to be compared and the MAPE has to be computed.
6. The ‘best’ model obtained from above should be used to forecast demand for the
period Oct 2017 to December 2018 for both items. The forecasted values as well as
their upper and lower confidence limits should be provided. If you are the store
manager what decisions would you make after looking at the demand of the two
items over years.
2. Exploratory Data Analysis – Step by step
approach
2.1.1. Install necessary Packages and Invoke Libraries
The necessary packages were installed and the associated libraries were
invoked. Having all the packages at the same places increases code
readability.
Please refer Appendix A for Source Code.

2.1.2. Set up working Directory


Setting a working directory on starting of the R session makes importing
and exporting data files and code files easier. Basically, working directory
is the location/ folder on the PC where you have the data, codes etc.
related to the project.
Please refer Appendix A for Source Code.

2.1.3. Import and Read the Dataset


The given dataset is in .xlsx format. Hence, the command ‘read.xlsx’ is
used for importing the file.
Please refer Appendix A for Source Code.
3. Explanation and conclusion

3.1. Exploring the dataset


In the given dataset we have demand data for Item A and B for the period January
2002 to July 2017 (as against Sep 2017 mentioned in the problem statement).

> demand <- read_excel("Demand.xlsx")


> names(demand)[1:4] <- c("Year","Month","ItemA","ItemB")
> dim(demand)
[1] 187 4
> head(demand)
# A tibble: 6 x 4
Year Month ItemA ItemB
<chr> <chr> <dbl> <dbl>
1 2002 1 1954 2585
2 2002 2 2302 3368
3 2002 3 3054 3210
4 2002 4 2414 3111
5 2002 5 2226 3756
6 2002 6 2725 4216
> summary(demand[,3:4])
ItemA ItemB
Min. :1954 Min. :1153
1st Qu.:2748 1st Qu.:2362
Median :3134 Median :2876
Mean :3263 Mean :2962
3rd Qu.:3741 3rd Qu.:3468
Max. :5725 Max. :5618
3.2. Conversion of the dataset into time series
The data is continuous monthly data for the whole period without any breaks. This
qualifies for a time series analysis on the demand for Item A & B, subject to other
assumptions being valid.

> ItemAdemand <- ts(demand[,3], start=c(2002,1), end=c(2017,7), frequency=12)


> plot(ItemAdemand)
> ItemBdemand <- ts(demand[,4], start=c(2002,1), end=c(2017,7), frequency=12)
> plot(ItemBdemand)

From the above plots, we can see Item A has an increasing demand, whereas Item B
has fall in demand. Also, there is some seasonality and trend in demands. Both Item
A and B doesn’t seem to have cyclic in nature. Item A variation increases with time
whereas Item B variation decreases.
3.3. Decomposition of the time series dataset
​The original time series is often computed (decompose) into 3 sub-time series:
1. Seasonal:​ patterns that repeat with fixed period of time.
2. Trend:​ the underlying trend of the metrics.
3. Random: (also call “noise”, “Irregular” or “Remainder”) Is the residuals of the
time series after allocation into the seasonal and trends time series.

Other than above three component there is ​Cyclic component which occurs after
long period of time.

Additive or multiplicative decomposition? To get a successful decomposition, it is


important to choose between the ​additive or ​multiplicative ​model. To choose the
right model we need to look at the time series.

a. The ​additive model is useful when the seasonal variation is relatively


constant over time.
b. The ​multiplicative model is useful when the seasonal variation increases
over time.

> monthplot(ItemAdemand)
>monthplot(ItemBdemand)

The seasonal variation looked to be about the same magnitude across time, so an
additive decomposition might give good results.

Decomposing the time series using STL


STL is an acronym for “Seasonal and Trend decomposition using Loess”. It does an
additive decomposition and the four graphs are the original data, seasonal
component, trend component and the remainder.
> ItemAseasonality <- stl(ItemAdemand[,1], s.window="p") #constant seasonality
> plot(ItemAseasonality)

> ItemAseasonality
Call:
stl(x = ItemAdemand[, 1], s.window = "p")

Components
seasonal trend remainder
Jan 2002 -970.47187 2838.594 85.8774947
Feb 2002 -454.22689 2841.712 -85.4854465
Mar 2002 -124.79419 2844.830 333.9638914
Apr 2002 -364.03321 2850.118 -72.0843134
May 2002 -314.83443 2855.405 -314.5703082
Jun 2002 -343.58304 2861.823 206.7602840
> ItemBseasonality <- stl(ItemBdemand[,1], s.window="p") #constant seasonality
> plot(ItemBseasonality)

> ItemBseasonality
Call:
stl(x = ItemBdemand[, 1], s.window = "p")

Components
seasonal trend remainder
Jan 2002 -1222.0193 3777.014 30.0049071
Feb 2002 -860.5255 3773.067 455.4584776
Mar 2002 -438.6564 3769.120 -120.4632005
Apr 2002 -145.5874 3763.765 -507.1773823
May 2002 354.4193 3758.410 -356.8292167
Jun 2002 451.4649 3749.684 14.8509274

From the above decomposed details, we can see that there is continuous increase in
demand for Item A, but on contrary similar drop pattern is observed for Item B.
Decompose the time series and plot the deseasoned series

If the focus is on figuring out whether the general trend of demand is up, we
deseasonalize, and possibly forget about the seasonal component. However, if you
need to forecast the demand in next month, then you need take into account both the
secular trend and seasonality.

>ItemAdeseason <- (ItemAseasonality$time.series[,2] + ItemAseasonality$time.series[,3])


>ts.plot(ItemAdeseason, ItemAdemand, col=c( "blue", "red"), main="Comparison of
Demand and Deseasonalized Demand for Item A")
>ItemBdeseason <- (ItemBseasonality$time.series[,2]+ItemBseasonality$time.series[,3])
>ts.plot(ItemBdeseason, ItemBdemand, col=c( "blue", "red"), main="Comparison of
Demand and Deseasonalized Demand for Item B")

The above plot show demand in Red and de-seasoned demand in Blue, we can see
that there is decreasing trend of demand.
Divide data into test and train
The data is divided into Training and test data for both Item A and B. The dataset is
divided in such a way that the older data is used as the training data whereas the
most recent data is used as the testing data.

>ATrain <- window(ItemAdemand, start=c(2002,1), end=c(2015,12), frequency=12)


>ATest <- window(ItemAdemand, start=c(2016,1), frequency=12)

>BTrain <- window(ItemBdemand, start=c(2002,1), end=c(2015,12), frequency=12)


>BTest <- window(ItemBdemand, start=c(2016,1), frequency=12)

The training data of the both the items are deseasoned ​and converted into seasonal,
trend and irregular component using STL

>ItemATrain <- stl(ATrain[,1], s.window="p")


>ItemBTrain <- stl(BTrain[,1], s.window="p")
3.4. Forecasting using random method
Forecast on train data of Item A
The forecast function forecasts the training dataset for 19 future months. The cbind
function puts the Test data and the forecasted data together.

> stlForecastA <- forecast(ItemATrain, method="rwdrift", h=19)


> VecA<- cbind(ATest,stlForecastA$mean)

The ts plot function plots the test data and the forecasted data to check how well the
forecasted data matches the actual test data.
> ts.plot(VecA, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly
Demand A: Actual vs Forecast")

Calculates the mean absolute percentage error (Deviation) function for the forecast
and the eventual outcomes.
> MAPEA <- mean(abs(VecA[,1]-VecA[,2])/VecA[,1])
> MAPEA
[1] 0.1408798

Box-Ljung Test
To check is residual are independent
H0:​ Residuals are independent
Ha:​ Residuals are not independent
>Box.test(stlForecastA$residuals, lag=10, type="Ljung-Box")
Box-Ljung test

data: stlForecastA$residuals
X-squared = 72.292, df = 10, p-value = 1.597e-11

Conclusion: Reject Ha: Residuals are not independent

Forecast on train data of Item B


The forecast function forecasts the training dataset for 19 future months. The cbind
function puts the Test data and the forecasted data together.
> stlForecastb <- forecast(ItemBTrain, method="rwdrift", h=19)
> VecB<- cbind(BTest,stlForecastb$mean)

The ts plot function plots the test data and the forecasted data to check how well the
forecasted data matches the actual test data.
>ts.plot(VecB, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly
Demand B: Actual vs Forecast")

Calculates the mean absolute percentage error (Deviation) function for the forecast
and the eventual outcomes.
>MAPEB <- mean(abs(VecB[,1]-VecB[,2])/VecB[,1])
>MAPEB
[1] 0.1082608
Box-Ljung Test
To check is residual are independent
H0:​ Residuals are independent
Ha:​ Residuals are not independent

> Box.test(stlForecastb$residuals, lag=10, type="Ljung-Box")

Box-Ljung test

data: stlForecastb$residuals
X-squared = 63.798, df = 10, p-value = 6.878e-10

Conclusion: Reject Ha: Residuals are not independent

From the above MAPE results we can see the 14 % and 10.8% less accuracy in
model.
3.5. Forecasting using Holt Winters
The Holt-Winters seasonal method comprises the forecast equation and three
smoothing equations — one for the level, one for trend, and one for the seasonal
component, with smoothing parameters alpha(​α​), beta(​β​) and gamma(​γ​).

Forecasting for Item A

>hwItemA <- HoltWinters(as.ts(ATrain),seasonal="additive")


>hwItemA

Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = as.ts(ATrain), seasonal = "additive")

Smoothing parameters:
alpha: 0.1241357
beta : 0.03174654
gamma: 0.3636975

Coefficients:
[,1]
a 3753.348040
b 7.663395
s1 -1250.098605
s2 -438.592232
s3 -224.017731
s4 -407.395313
s5 -507.668223
s6 -667.267246
s7 63.659702
s8 197.909330
s9 -301.525945
s10 25.272325
s11 712.529546
s12 1545.291998
>plot(hwItemA)

>hwAForecast <- forecast(hwItemA, h=19)


>VecA1 <- cbind(ATest,hwAForecast)
>ts.plot(VecA1[,1],VecA1[,2], col=c("blue","red"),xlab="year", ylab="demand",
main="Demand A: Actual vs Forecast")

>Box.test(hwAForecast$residuals, lag=20, type="Ljung-Box")


Box-Ljung test

data: hwAForecast$residuals
X-squared = 14.227, df = 20, p-value = 0.8188

Conclusion: Reject H0: Residuals are independent

>MAPE(VecA1[,1],VecA1[,2])
[1] 0.1160528
Forecasting for Item B

>hwItemB <- HoltWinters(as.ts(BTrain),seasonal="additive")


>hwItemB
Holt-Winters exponential smoothing with trend and additive seasonal component.

Call:
HoltWinters(x = as.ts(BTrain), seasonal = "additive")

Smoothing parameters:
alpha: 0.0166627
beta : 0.4878834
gamma: 0.5000132

Coefficients:
[,1]
a 2297.12724
b -15.29024
s1 -1222.01821
s2 -1012.34884
s3 -442.56913
s4 -307.95973
s5 79.56065
s6 258.33260
s7 697.64492
s8 241.68337
s9 -246.12729
s10 -465.09216
s11 120.77708
s12 412.50043
>plot(hwItemB)

>hwBForecast <- forecast(hwItemB, h=19)


>VecB1 <- cbind(BTest,hwBForecast)
>ts.plot(VecB1[,1],VecB1[,2], col=c("blue","red"),xlab="year", ylab="demand",
main="Demand B: Actual vs Forecast")

>Box.test(hwBForecast$residuals, lag=20, type="Ljung-Box")


Box-Ljung test

data: hwBForecast$residuals
X-squared = 13.101, df = 20, p-value = 0.873
Conclusion: Do not reject H0: Residuals are independent

>MAPE(VecB1[,1],VecB1[,2])
[1] 0.1867152

MAPE is 11.6 % and 18.6 % for item A and Item B resp.


3.6. Forecast using ARIMA model
Check for stationary time series
Dickey-Fuller test

Statistical tests make strong assumptions about your data. They can only be used to
inform the degree to which a null hypothesis can be accepted or rejected. The result
must be interpreted for a given problem to be meaningful. Nevertheless, they can
provide a quick check and confirmatory evidence that your time series is stationary or
non-stationary.

Null Hypothesis (H0):​ If accepted, it suggests the time series has a unit root,
meaning it is non-stationary. It has some time dependent structure.

Alternate Hypothesis (H1):​ The null hypothesis is rejected; it suggests the time
series does not have a unit root, meaning it is stationary. It does not have
time-dependent structure.

p-value > 0.05:​ Accept the null hypothesis (H0), the data has a unit root and is
non-stationary.

p-value <= 0.05:​ Reject the null hypothesis (H0), the data does not have a unit root
and is stationary.

Item A

>adf.test(ItemAdemand)
Augmented Dickey-Fuller Test

data: ItemAdemand
Dickey-Fuller = -7.8632, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

>ItemAdiff <- diff(ItemAdemand)


>plot(ItemAdiff)

>adf.test(diff(ItemAdemand))
Augmented Dickey-Fuller Test

data: diff(ItemAdemand)
Dickey-Fuller = -8.0907, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

Item B

>adf.test(ItemBdemand)
Augmented Dickey-Fuller Test

data: ItemBdemand
Dickey-Fuller = -12.967, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

>ItemBdiff <- diff(ItemBdemand)


>plot(ItemBdiff)

> adf.test(diff(ItemBdemand))

Augmented Dickey-Fuller Test

data: diff(ItemBdemand)
Dickey-Fuller = -9.8701, Lag order = 5, p-value = 0.01
alternative hypothesis: stationary

From the above ADF test the Null Hypothesis is rejected.


The time series of differences (above) does appear to be stationary in mean and
variance, as the level of the series stays roughly constant over time, and the variance
of the series appears roughly constant over time.

ACF and PACF (performing to check the stationary data and


autocorrelation)
The function Acf computes an estimate of the autocorrelation function of a (possibly
multivariate) time series. Function Pacf computes an estimate of the partial
autocorrelation function of a (possibly multivariate) time series.
ACF

● Autocorrelation of different orders gives inside information regarding time


series. It will help determine order p of the series
● Significant autocorrelations imply observations of long past influences
current observation.
● lag.max: maximum lag at which to calculate the acf. Default is 10∗log10(N/m)
● where N is the number of observations and m the number of series.
● Will be automatically limited to one less than the number of observations in
the series

PACF

● Partial autocorrelation adjusts for the intervening periods

>acf(ItemAdemand,lag=15)
>acf(ItemAdiff, lag=15)

>acf(ItemBdemand,lag=15)
>acf(ItemBdiff, lag=15)

>acf(ItemAdemand,lag=50)
>acf(ItemAdiff, lag=50)

>pacf(ItemAdemand)
>pacf(ItemAdiff)

>acf(ItemBdemand,lag=50)
>acf(ItemBdiff, lag=50)

>pacf(ItemBdemand)
>pacf(ItemBdiff)

ARIMA model

ARMA models are commonly used in time series modeling. In ARMA model, AR
stands for auto-regression and MA stands for moving average.

The above ACF and PACF we have found out that the positive and negative values
mean (that is because of data is stationary); they are not cuts for AR(2) series and no
gradually decrease in the value of PACF, no significance of MA(2).

Item A

> ItemA_ArimaTrain <- auto.arima(ATrain, seasonal=TRUE)


> ItemA_ArimaTrain
Series: ATrain
ARIMA(0,0,0)(0,1,1)[12] with drift

Coefficients:
sma1 drift
-0.6581 3.9132
s.e. 0.0798 0.9188

sigma^2 estimated as 116022: log likelihood=-1133.35


AIC=2272.71 AICc=2272.86 BIC=2281.86
> plot(ItemA_ArimaTrain$residuals)

> plot(ItemA_ArimaTrain$x,col="blue")
> lines(ItemA_ArimaTrain$fitted,col="red",main="Demand A: Actual vs Forecast")

> MAPE(ItemA_ArimaTrain$fitted,ItemA_ArimaTrain$x)
[1] 0.0733376

The MAPE percentage error is now reduced to 7.3% for ARIMA model
>acf(ItemA_ArimaTrain$residuals)

>pacf(ItemA_ArimaTrain$residuals)
Box-Ljung Test
To check is residual are independent
H0: Residuals are independent
Ha: Residuals are not independent
>Box.test(ItemA_ArimaTrain$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)
Box-Ljung test

data: ItemA_ArimaTrain$residuals
X-squared = 16.716, df = 10, p-value = 0.0809

Conclusion: Reject H0: Residuals are independent

Forecasting on hold out dataset

> arimaAforecast <- forecast(ItemA_ArimaTrain, h=19)


> VecA2 <- cbind(ATest,arimaAforecast)
> ts.plot(VecA2[,1],VecA2[,2], col=c("blue","red"),xlab="year", ylab="demand",
main="Demand A: Actual vs Forecast")

From the plot and data, we can see the forecasted value follows almost the same as
actual value, there are point of interaction at Jan 2016, May 2016, Dec 2016, Jan
2017.
Item B

> ItemB_ArimaTrain <- auto.arima(BTrain, seasonal=TRUE)


> ItemB_ArimaTrain
Series: BTrain
ARIMA(0,0,0)(2,1,1)[12] with drift

Coefficients:
sar1 sar2 sma1 drift
0.2141 0.0379 -0.7536 -10.1449
s.e. 0.1958 0.1382 0.1773 0.9005

sigma^2 estimated as 104121: log likelihood=-1123.42


AIC=2256.83 AICc=2257.23 BIC=2272.08

>plot(ItemB_ArimaTrain$residuals)
>plot(ItemB_ArimaTrain$x,col="blue")

>lines(ItemB_ArimaTrain$fitted,col="red",main="Demand B: Actual vs Forecast")


>MAPE(ItemB_ArimaTrain$fitted,ItemB_ArimaTrain$x)
[1] 0.07318176
The MAPE percentage error is now reduced to 9% for ARIMA model

>acf(ItemB_ArimaTrain$residuals)

>pacf(ItemB_ArimaTrain$residuals)
Box-Ljung Test
To check is residual are independent
H0: Residuals are independent
Ha: Residuals are not independent

>Box.test(ItemB_ArimaTrain$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)


Box-Ljung test

data: ItemB_ArimaTrain$residuals
X-squared = 18.298, df = 10, p-value = 0.05014

Conclusion: Reject H0: Residuals are independent

Forecasting on hold out dataset


>arimaBforecast <- forecast(ItemB_ArimaTrain, h=19)
>VecB2 <- cbind(BTest,arimaBforecast)
>ts.plot(VecB2[,1],VecB2[,2], col=c("blue","red"),xlab="year", ylab="demand",
main="Demand B: Actual vs Forecast")

From the plot and data, we can see the forecasted value to some extent follows the
actual value.
3.7. Model Comparison
For Time Series Forecasting problem, we observed the trend and seasonality in the
data.

We have observed that the Item A has increasing trend, but for Item B the trend is
declining.

Also, we observed for both item there are few months with high variation in
seasonality; and for Item A there are few outliers.

As the seasonality was not following the trend pattern we have used the “Additive”
seasonality. We have performed the three models

1. Random Model,

2. Holt Winters and

3. ARIMA model.

Below are MAPE and Box-Ljung test observations for Models.

Random Model

Item A# 0.1408798 (14%), p-value < 2.2e-16

Item B# 0.1082608 (10.8%), p-value = 2.931e-13

Holt Winters

Item A# 0.1160528 (11.6%), p-value = 0.8188

Item B# 0.1867152 (18.6%), p-value = 0.873

ARIMA

Item A# 0.0733376 (7%), p-value = 0.0809

Item B# 0.07654621 (7%), p-value = 0.09177

From the MAPE values observed the ARIMA model provided the lowest values and
we selected the model for the Forecasting.
3.8. Further forecasting using ARIMA model and conclusion
Forecast A
>ItemA_arima <- auto.arima(ItemAdemand, seasonal=TRUE)
>ItemAforecast <- forecast(ItemA_arima, h=17)
>plot(ItemAforecast)
>ItemAforecast
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Aug 2017 4320.211 3879.991 4760.431 3646.953 4993.469
Sep 2017 4169.513 3725.551 4613.476 3490.531 4848.495
Oct 2017 4428.791 3981.385 4876.197 3744.542 5113.040
Nov 2017 5102.669 4652.091 5553.246 4413.570 5791.767
Dec 2017 5879.220 5425.721 6332.719 5185.653 6572.787
Jan 2018 2819.535 2363.343 3275.727 2121.849 3517.221
Feb 2018 3990.984 3532.307 4449.660 3289.498 4692.469
Mar 2018 4181.449 3720.480 4642.419 3476.458 4886.441
Apr 2018 4081.089 3618.003 4544.174 3372.860 4789.317
May 2018 3888.336 3423.296 4353.376 3177.118 4599.554
Jun 2018 4029.525 3562.679 4496.370 3315.545 4743.504
Jul 2018 4390.292 3921.777 4858.807 3673.760 5106.823
Aug 2018 4407.590 3900.778 4914.402 3632.487 5182.693
Sep 2018 4257.019 3747.019 4767.019 3477.041 5036.997
Oct 2018 4516.419 4003.480 5029.358 3731.946 5300.892
Nov 2018 5190.414 4674.763 5706.065 4401.794 5979.034
Dec 2018 5967.079 5448.925 6485.232 5174.632 6759.526
Forecast B
>ItemB_arima <- auto.arima(ItemBdemand, seasonal=TRUE)
>ItemBforecast <- forecast(ItemB_arima, h=17)
>plot(ItemBforecast)

>ItemBforecast
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Aug 2017 2356.360 1945.1156 2767.605 1727.4157 2985.305
Sep 2017 2082.947 1671.7024 2494.192 1454.0025 2711.892
Oct 2017 1784.795 1373.5500 2196.040 1155.8501 2413.740
Nov 2017 2436.402 2025.1570 2847.647 1807.4571 3065.347
Dec 2017 2429.861 2018.6162 2841.106 1800.9163 3058.806
Jan 2018 965.227 553.9834 1376.471 336.2842 1594.170
Feb 2018 1278.230 866.9865 1689.474 649.2873 1907.173
Mar 2018 1693.373 1282.1294 2104.617 1064.4302 2322.316
Apr 2018 2088.924 1677.6805 2500.168 1459.9813 2717.867
May 2018 2342.673 1931.4290 2753.916 1713.7298 2971.615
Jun 2018 2587.570 2176.3268 2998.814 1958.6276 3216.513
Jul 2018 2903.952 2492.7084 3315.196 2275.0092 3532.895
Aug 2018 2267.612 1814.7093 2720.515 1574.9570 2960.267
Sep 2018 1945.385 1492.4816 2398.287 1252.7293 2638.040
Oct 2018 1663.827 1210.9242 2116.730 971.1719 2356.482
Nov 2018 2293.259 1840.3561 2746.162 1600.6038 2985.914
Dec 2018 2325.067 1872.1643 2777.970 1632.4120 3017.722
4. Appendix A – Source Code
4.1. The source code for Multiple Linear Regression has been written below
demand <- read_excel("Demand.xlsx")
names(demand)[1:4] <- c("Year","Month","ItemA","ItemB")

dim(demand)
head(demand)
summary(demand[,3:4])

ItemAdemand <- ts(demand[,3], start=c(2002,1), end=c(2017,7), frequency=12)


plot(ItemAdemand)

ItemBdemand <- ts(demand[,4], start=c(2002,1), end=c(2017,7), frequency=12)


plot(ItemBdemand)

monthplot(ItemAdemand)
monthplot(ItemBdemand)

ItemAseasonality <- stl(ItemAdemand[,1], s.window="p") #constant seasonality


plot(ItemAseasonality)
ItemAseasonality <- stl(ItemAdemand[,1], s.window=7)
plot(ItemAseasonality)

ItemBseasonality <- stl(ItemBdemand[,1], s.window="p") #constant seasonality


plot(ItemBseasonality)
ItemAseasonality
ItemBseasonality <- stl(ItemBdemand[,1], s.window=7)
plot(ItemBseasonality)
ItemBseasonality

ItemAdeseason <- (ItemAseasonality$time.series[,2]+ItemAseasonality$time.series[,3])


ts.plot(ItemAdeseason, ItemAdemand, col=c( "blue", "red"), main="Comparison of
Demand and Deseasonalized Demand for Item A")

ItemBdeseason <- (ItemBseasonality$time.series[,2]+ItemBseasonality$time.series[,3])


ts.plot(ItemBdeseason, ItemBdemand, col=c( "blue", "red"), main="Comparison of
Demand and Deseasonalized Demand for Item B")

ATrain <- window(ItemAdemand, start=c(2002,1), end=c(2015,12), frequency=12)


ATest <- window(ItemAdemand, start=c(2016,1), frequency=12)

BTrain <- window(ItemBdemand, start=c(2002,1), end=c(2015,12), frequency=12)


BTest <- window(ItemBdemand, start=c(2016,1), frequency=12)

ItemATrain <- stl(ATrain[,1], s.window="p")


ItemBTrain <- stl(BTrain[,1], s.window="p")

library(forecast)
stlForecastA <- forecast(ItemATrain, method="rwdrift", h=19)
stlForecastb <- forecast(ItemBTrain, method="rwdrift", h=19)

VecA<- cbind(ATest,stlForecastA$mean)
VecB<- cbind(BTest,stlForecastb$mean)

ts.plot(VecA, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly Demand


A: Actual vs Forecast")
MAPEA <- mean(abs(VecA[,1]-VecA[,2])/VecA[,1])
MAPEA

Box.test(stlForecastA$residuals, lag=10, type="Ljung-Box")

ts.plot(VecB, col=c("blue", "red"),xlab="year", ylab="demand", main="Quarterly Demand


B: Actual vs Forecast")

MAPEB <- mean(abs(VecB[,1]-VecB[,2])/VecB[,1])


MAPEB

Box.test(stlForecastb$residuals, lag=10, type="Ljung-Box")

#Holt Winter method

hwItemA <- HoltWinters(as.ts(ATrain),seasonal="additive")


hwItemA
plot(hwItemA)

hwAForecast <- forecast(hwItemA, h=19)


VecA1 <- cbind(ATest,hwAForecast)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecA1[,1],VecA1[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand A: Actual vs Forecast")

Box.test(hwAForecast$residuals, lag=20, type="Ljung-Box")

install.packages("MLmetrics")
library(MLmetrics)
MAPE(VecA1[,1],VecA1[,2])
MAPEA1 <- mean(abs(VecA1[,1]-VecA1[,2])/VecA1[,1])
MAPEA1

#B

hwItemB <- HoltWinters(as.ts(BTrain),seasonal="additive")


hwItemB
plot(hwItemB)

hwBForecast <- forecast(hwItemB, h=19)


VecB1 <- cbind(BTest,hwBForecast)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecB1[,1],VecB1[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand B: Actual vs Forecast")

Box.test(hwBForecast$residuals, lag=20, type="Ljung-Box")

#install.packages("MLmetrics")
library(MLmetrics)
MAPE(VecB1[,1],VecB1[,2])

#ARIMA

library(tseries)
adf.test(ItemAdemand)
ItemAdiff <- diff(ItemAdemand)
plot(ItemAdiff)

adf.test(diff(ItemAdemand))

#B
adf.test(ItemBdemand)

ItemBdiff <- diff(ItemBdemand)


plot(ItemBdiff)

adf.test(diff(ItemBdemand))

# ACF PACF

acf(ItemAdemand,lag=15)
acf(ItemAdiff, lag=15)

acf(ItemBdemand,lag=15)
acf(ItemBdiff, lag=15)

acf(ItemAdemand,lag=50)
acf(ItemAdiff, lag=50)

pacf(ItemAdemand)
pacf(ItemAdiff)

acf(ItemBdemand,lag=50)
acf(ItemBdiff, lag=50)
pacf(ItemBdemand)
pacf(ItemBdiff)

ItemA_ArimaTrain <- auto.arima(ATrain, seasonal=TRUE)


ItemA_ArimaTrain

plot(ItemA_ArimaTrain$residuals)
plot(ItemA_ArimaTrain$x,col="blue")
lines(ItemA_ArimaTrain$fitted,col="red",main="Demand A: Actual vs Forecast")

MAPE(ItemA_ArimaTrain$fitted,ItemA_ArimaTrain$x)

acf(ItemA_ArimaTrain$residuals)
pacf(ItemA_ArimaTrain$residuals)

Box.test(ItemA_ArimaTrain$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)

#Forecast on holdout

arimaAforecast <- forecast(ItemA_ArimaTrain, h=19)


VecA2 <- cbind(ATest,arimaAforecast)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecA2[,1],VecA2[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand A: Actual vs Forecast")

#B
ItemB_ArimaTrain <- auto.arima(BTrain, seasonal=TRUE)
ItemB_ArimaTrain

plot(ItemB_ArimaTrain$residuals)
plot(ItemB_ArimaTrain$x,col="blue")
lines(ItemB_ArimaTrain$fitted,col="red",main="Demand B: Actual vs Forecast")

MAPE(ItemB_ArimaTrain$fitted,ItemB_ArimaTrain$x)

acf(ItemB_ArimaTrain$residuals)
pacf(ItemB_ArimaTrain$residuals)

Box.test(ItemB_ArimaTrain$residuals, lag = 10, type = c("Ljung-Box"), fitdf = 0)

#Forecast on holdout

arimaBforecast <- forecast(ItemB_ArimaTrain, h=19)


VecB2 <- cbind(BTest,arimaBforecast)

par(mfrow=c(1,1), mar=c(2, 2, 2, 2), mgp=c(3, 1, 0), las=0)

ts.plot(VecB2[,1],VecB2[,2], col=c("blue","red"),xlab="year", ylab="demand",


main="Demand B: Actual vs Forecast")

#Forecast A
ItemA_arima <- auto.arima(ItemAdemand, seasonal=TRUE)
ItemAforecast <- forecast(ItemA_arima, h=17)

plot(ItemAforecast)
ItemAforecast
#Forecast B
ItemB_arima <- auto.arima(ItemBdemand, seasonal=TRUE)
ItemBforecast <- forecast(ItemB_arima, h=17)

plot(ItemBforecast)
ItemBforecast

You might also like