You are on page 1of 10

Name: Lakshmi Harshitha Yechuri

PGID: 12110035

Subject: Forecasting Analytics

Topic: FA Individual Assignment

Q1. Consider the data set SouvenirSales.xls (1995 Jan -2001 Dec) that gives the monthly sales of
souvenir at a shop in New York. Back in 2001, an analyst was appointed to forecast sales for the next
12 months (Year 2002). The analyst portioned the data by keeping the last 12 months of data (year
2001) as validation set, and the remaining data as training set. Answer the following questions. Use
R.

a) Plot the time series of the original data. Which time series components appear from the plot.
(2+2=4 points)

The time series has been plotted using the plot.ts function.

The time series components that we can see in the plot are -

1. Level - we can see all data points are indeed collected and we can find an average of the set.
2. Trend - an increase in the overall sales of the souvenir store with time.
3. Seasonality - We can see a slight increase in February and a peak in the end of every year.

b) Fit a linear trend model with additive seasonality (Model A) and exponential trend model with
multiplicative seasonality (Model B). Consider January as the reference group for each model.
Produce the regression coefficients and the validation set errors. Remember to fit only the training
period. (5+5=10 points)

tslm function has been used to fit the train data into linear trend model with additive seasonality. A
log of train data has been used to fit the data into exponential trend with multiplicative seasonality.

Regression Coefficients are produced by passing model argument to the predictions.


Validation set errors plotted as below –
c) Which model is the best model considering RMSE as the metric? Could you have understood this
from the line chart? Explain. Produce the plot showing the forecasts from both models along with
actual data. In a separate plot, present the residuals from both models (consider only the validation
set residuals). (2 +2 + 3+ 3 = 10 points)

Using the RMSE metric, we can see RMSE of the multiplicative model is much lower than the additive
model. Hence, multiplicative model is the best fit. It is visible in the line chart as well, where we can
see an exponential trend in the last year.

Plot of forecasts

Plot of residuals
d) Examine the additive model. Which month has the highest average sales during the year. What
does the estimated trend coefficient in the model A mean? (2 + 2 = 4 points)

Additive model highest sales & Interpreting the trend coefficient.

The highest monthly average sales is in December - USD 50014.59.

The trend coefficient is 245.4 for the additive model. Its a positive number, hence we can says sales
are increasing with time. The magnitude gives us the expected increase in sales, without considering
the seasonality factor.

e) Examine the multiplicative model. What does the coefficient of October mean? What does the
estimated trend coefficient in the model B mean? (8 points)

The exponent of October coefficient gives us the additional expected sales for October, above the
intercept and trend. The trend coefficient gives us the expected increase in sales for every year.

f) Use the best model type from part (c) to forecast the sales in January 2002. Think carefully which
data to use for model fitting in this case. (8 points)

The multiplicative model is a more accurate, and can be used for January prediction.

The forecasted sales in January using the multiplicative method is 12601.02.


g) Plot the ACF and PACF plot until lag 20 of the residuals obtained from training set of the best
model chosen. Comment on these plots and think what AR(p) model could be a good choice?
(2+2+4= 8 points)

We can decide the order of the AR function based on the number of significant spikes in the PACF
plot. The graph shows 4 significant spikes, indicating that AR(4) model would be a good choice.

However, the ACF plot shows only two spikes, indicating MA as a better model for this data. From
the graph, ARIMA(2,2,0) seems to be the best fit.
h) Fit an AR(p) model as you think appropriate from part (g) to the training set residuals and produce
the regression coefficients. Was your intuition at part (g) correct? (4 + 4 = 8 points)

The plot of actual vs forecasted errors; and their differences, is quite random. We can be confident
that ARIMA(2,2,0) is a good fit.
i) Now, using the best regression model and AR(p) model, forecast the sales in January 2002. Think
carefully which data to use for model fitting in this case. (6 points)

Predicted sales in Jan'22 using the best regression and AR(P) model is 59048.96 USD.

Q2. Short answer type questions: (9 x 2 points)

a. Explain the key difference between cross sectional and time series data.

Cross sectional Time Series

Focuses on multiple variables at the same time Deals with single variable over a period.

Time series data is useful in business applications

Eg: maximum temperature of several cities on a single day is an example for a cross sectional da

Eg: profit of an organization over a period of 5 years

b. Explain the difference between seasonality and cyclicality.

Seasonality Cyclicality

Seasonal component is where a particular pattern is repeated after a regular time interval.

Cyclical variation are due to the ups and downs recurring after a period from time to time.

The recurrence period is generally less than a year Cyclical pattern exhibit rises and falls that
are not of fixed period

c. Explain why centered moving average is not-considered suitable for forecasting.

Answer: moving average values are placed at the period in which they are calculated. ... When you
center the moving averages, they are placed at the centre of the range rather than the end of it.

• It is not considered suitable for forecasting because it requires maintaining history of


different time periods for each forecasted period.

• Often overlooks complex relationships mentioned in the data.

• Does not respond to the fluctuation that take place for a reason, for example cycles and
seasonal impacts.
d. Explain stationarity and why is it important for some time series forecasting

methods?

Answer: Stationary means that the statistical properties of a time series do not change over time. It
is important because many useful analytical tools rely on it.

For data to be stationary , the statistical properties of a system do not change over time. When a
time series is stationary, it can be easier to model

e. How does an ACF plot help to identify whether a time series is stationary or not?

Answer: the ACF plot for a stationary time series will drop to zero relatively quickly in comparison to
that of a non- stationary data which decreases slowly.

f. Why partitioning time series data into training, validation, and test set is not recommended?
Briefly describe two considerations for choosing the width of validation period.

Answer: The fast and powerful methods that we rely on in machine learning, such as using
train-test splits and k-fold cross validation, do not work in the case of time series data. This is
because they ignore the temporal components inherent in the problem.

Part b pending

g. Both smoothing and ARIMA method of forecasting can handle time series data with missing
value. True/False. Explain

Answer: False both Arima and smoothing cannot handle time series data with missing value.

ARIMA method of forecasting can handle time series data with missing values easily because all
ARIMA models are state space models which, deals with missing values exactly by skipping the
update phase.

Smoothing family models does not have standard way to deal with missing data.

h. Additive and multiplicative decomposition differ in the way the trend is computed.

True /False. Explain.

Answer: True. Additive and Multiplicative decomposition differ in the way the trend is computed. For
multiplicative decomposition , it is done by dividing the series by the trend value however for
additive decomposition it is done by subtracting the trend estimates from series.

i. After accounting for trend and seasonality in a time series data, the analyst observes that
there is still correlation left amongst the residuals of the time series. Is that a good or bad news for
the analyst? Explain.

You might also like