You are on page 1of 26

Question 1:

Periods of economic downturn (plotted as red lines in the curves below):


- March 2001 to November 2001
- December 2007 to June 2009
- February 2020 to April 2020

Both curves (Sales and logSales) have trends that closely mirror the economic downturns,
suggesting that furniture sales are correlated with macroeconomic situation. The number of
turns (marked in red circles) are 4, suggesting a trend that can be modeled with a polynomial of
degree 5.
Seasonality of period 12 is evident in both plots, with furniture sales consistently peaking during
December and sinking during January/February each year.

Also observed is a huge drop in Sales in the first half of 2020, which coincides with the onset of
COVID-19. This is expected as the uncertainty and closing down of shops caused people to
delay their purchases, as observed in the higher than normal peaks in the months that followed.

As far as the choice of an additive or multiplicative model is concerned, the variance of errors
around the trend line doesn’t seem to be changing with time. This implies that an additive model
might suffice to model the time series. You can also see in the logSales plot that seasonal
variation decreases across time, which means that the multiplicative model is overcorrecting for
the variance as time progresses.

Question 2:

Part a)

Multiplicative model fit on years 2000 to 2019:

Call:
lm(formula = logSales ~ poly(Time, 5) + fMonth + c348 + s348,
data = furniture[97:336, ])

Residuals:
Min 1Q Median 3Q Max
-0.142912 -0.040866 0.005692 0.041562 0.119576

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.356699 0.012631 661.608 < 2e-16 ***
poly(Time, 5)1 0.695932 0.056410 12.337 < 2e-16 ***
poly(Time, 5)2 0.626769 0.056345 11.124 < 2e-16 ***
poly(Time, 5)3 0.729037 0.056504 12.902 < 2e-16 ***
poly(Time, 5)4 -0.469960 0.056353 -8.340 8.01e-15 ***
poly(Time, 5)5 -0.597400 0.056591 -10.556 < 2e-16 ***
fMonth2 0.015419 0.017838 0.864 0.388291
fMonth3 0.100289 0.017836 5.623 5.64e-08 ***
fMonth4 0.004955 0.017822 0.278 0.781232
fMonth5 0.068741 0.017849 3.851 0.000154 ***
fMonth6 0.042584 0.017844 2.386 0.017858 *
fMonth7 0.064094 0.017838 3.593 0.000403 ***
fMonth8 0.103645 0.017869 5.800 2.27e-08 ***
fMonth9 0.064089 0.017862 3.588 0.000410 ***
fMonth10 0.040166 0.017864 2.248 0.025536 *
fMonth11 0.108978 0.017897 6.089 4.97e-09 ***
fMonth12 0.115509 0.017889 6.457 6.69e-10 ***
c348 -0.009171 0.005165 -1.776 0.077181 .
s348 0.001928 0.005169 0.373 0.709512
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05634 on 221 degrees of freedom


Multiple R-squared: 0.7738, Adjusted R-squared: 0.7554
F-statistic: 42 on 18 and 221 DF, p-value: < 2.2e-16

The model is a multiplicative model with a 5th degree polynomial trend and static seasonal
indices. It also accounts for calendar structure, with the cos/sin pair of frequency 0.348 cycles
per month. The other cos/sin pair with 0.432 frequency turned out to be insignificant.

Part b)

fmonth Seasonal Index

Jan -0.060705879 0.9411000

Feb -0.045286388 0.9557237

Mar 0.039583073 1.0403769

Apr -0.055750512 0.9457751

May 0.008034932 1.0080673

Jun -0.018122312 0.9820409

Jul 0.003388443 1.0033942

Aug 0.042939378 1.0438746

Sep 0.003383440 1.0033892

Oct -0.020539717 0.9796698

Nov 0.048272102 1.0494562

Dec 0.054803440 1.0563330


The fmonth values need to be exponentiated in order to obtain the actual seasonal indices. For
eg, for January the seasonal value is 0.94, which implies that the Sales would be 6% below
trend.
As expected, January has the lowest seasonal value (~6% below trend) for sales, and
December has the highest (~5% above trend). Apparently, furniture sales pick up during holiday
season - perhaps due to discounts, or due to gift purchases.

Part c)
The qq plot is not hugging the straight line, indicating deviation from normality for the residuals.
Let’s test this using the Shapiro-Wilk test.

Shapiro-Wilk normality test

data: resid(modelq2)
W = 0.98378, p-value = 0.007725

The Shapiro-Wilk test indeed confirms lack of normality, since the p-value for a null hypothesis
of normality is significant. Let’s now plot the residuals across time:
The residuals do seem to indicate some remaining trend that hasn’t been captured. In an ideal
scenario, the residuals should be randomly fluctuating across time.
The ACF plot also confirms the presence of uncaptured trend. Ideally, only the lag 0 ACF should
be significant and the remaining insignificant. However, in this case there seems to be cosine
like trend in the ACF structure, indicating that there is trend remaining to be captured.

This is likely due to the actual data containing some dynamic seasonality. Our model is not able
to capture that sufficiently since we have used static estimates.

Plotting predicted against actual values, we see that the model is unable to capture the
changing amplitude of seasonal structure during the depression of 2008 and hence is thrown
off.

Question 3:

Part a)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.358e+00 6.782e-03 1232.505 < 2e-16 ***
poly(Time, 5)1 6.998e-01 2.973e-02 23.541 < 2e-16 ***
poly(Time, 5)2 6.334e-01 2.985e-02 21.223 < 2e-16 ***
poly(Time, 5)3 7.351e-01 3.004e-02 24.476 < 2e-16 ***
poly(Time, 5)4 -4.609e-01 3.009e-02 -15.315 < 2e-16 ***
poly(Time, 5)5 -5.894e-01 3.029e-02 -19.457 < 2e-16 ***
fMonth2 1.424e-02 9.475e-03 1.503 0.134412
fMonth3 9.854e-02 9.473e-03 10.402 < 2e-16 ***
fMonth4 3.009e-03 9.458e-03 0.318 0.750646
fMonth5 6.765e-02 9.470e-03 7.144 1.36e-11 ***
fMonth6 4.000e-02 9.461e-03 4.228 3.47e-05 ***
fMonth7 6.331e-02 9.462e-03 6.691 1.86e-10 ***
fMonth8 1.010e-01 9.462e-03 10.674 < 2e-16 ***
fMonth9 6.290e-02 9.470e-03 6.642 2.45e-10 ***
fMonth10 3.810e-02 9.458e-03 4.029 7.75e-05 ***
fMonth11 1.069e-01 9.473e-03 11.286 < 2e-16 ***
fMonth12 1.142e-01 9.475e-03 12.049 < 2e-16 ***
c348 -9.410e-03 2.703e-03 -3.481 0.000603 ***
s348 2.212e-03 2.716e-03 0.815 0.416227
c432 4.937e-05 2.721e-03 0.018 0.985539
s432 1.220e-02 2.701e-03 4.515 1.04e-05 ***
lresid 8.554e-01 3.541e-02 24.158 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02948 on 217 degrees of freedom


(1 observation deleted due to missingness)
Multiple R-squared: 0.9387, Adjusted R-squared: 0.9328
F-statistic: 158.3 on 21 and 217 DF, p-value: < 2.2e-16

This model has significantly improved R-squared value, compared to the first model. The lag 1
residuals have gone a long way towards capturing some trend. It is also noticed that both the
calendar structure variables are significant now.

Part b)
The residuals for this model look to be much more randomly normally distributed, indicating that
a lot of trend has been captured. Let’s verify this more thoroughly using the qq plot and the
Shapiro-Wilk test.

Shapiro-Wilk normality test

data: resid(modelq3)
W = 0.99558, p-value = 0.7277

The qq-plot and Shapiro-Wilk test confirm normality of residuals. The residuals stick to the
straight line in the qq-plot, and the p-value indicates insignificance in the case of the
Shapiro-Wilk test (can’t rule out the null hypothesis that the distribution is normal).

Now let’s examine the ACF plot to see if there is any uncaptured trend/seasonality.
There are spikes in the ACF plot at lags 6, 12, 18, 24, and so on. This aligns well with a half
yearly cycle and indicates that there might be some dynamic seasonality at play which the
system was unable to capture.

Part c)

Without With Lag 1


Lag 1 Residual
Residual

S1 0.9411000 0.9425635

S2 0.9557237 0.9560780

S3 1.0403769 1.0401755

S4 0.9457751 0.9454042

S5 1.0080673 1.0085356

S6 0.9820409 0.9810335

S7 1.0033942 1.0041643

S8 1.0438746 1.0427258

S9 1.0033892 1.0037549

S10 0.9796698 0.9791717

S11 1.0494562 1.0489220

S12 1.0563330 1.0565553


The seasonality estimates look very similar, indicating that seasonality was well captured even
without the lag-1 residuals. This implies that most of the improvement in the model is coming
from better capture of trend using the lag-1 residual.

Question 4:

Part a)

(i)
Data for 1992 to 2006 is plotted below. Only 2 turns are observed, so I try to fit a model of
polynomial degree 3 in addition to the seasonal components.
After eliminating insignificant variables, the final model is as follows:

Call:
lm(formula = logSales ~ poly(Time, 2) + fMonth + c348 + s348,
data = furniture_q4)

Residuals:
Min 1Q Median 3Q Max
-0.084872 -0.018746 0.000844 0.018414 0.106006

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.149736 0.008516 957.028 < 2e-16 ***
poly(Time, 2)1 2.462746 0.031877 77.257 < 2e-16 ***
poly(Time, 2)2 -0.297885 0.031804 -9.366 < 2e-16 ***
fMonth2 -0.013842 0.012047 -1.149 0.252389
fMonth3 0.072956 0.012043 6.058 1.04e-08 ***
fMonth4 0.006079 0.012020 0.506 0.613735
fMonth5 0.054895 0.012053 4.555 1.07e-05 ***
fMonth6 0.041219 0.012040 3.423 0.000795 ***
fMonth7 0.050007 0.012025 4.159 5.33e-05 ***
fMonth8 0.080306 0.012059 6.660 4.72e-10 ***
fMonth9 0.041785 0.012039 3.471 0.000676 ***
fMonth10 0.056578 0.012033 4.702 5.73e-06 ***
fMonth11 0.129003 0.012066 10.692 < 2e-16 ***
fMonth12 0.163770 0.012041 13.601 < 2e-16 ***
c348 -0.009896 0.003502 -2.826 0.005350 **
s348 0.002055 0.003502 0.587 0.558232
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0318 on 152 degrees of freedom
Multiple R-squared: 0.9777, Adjusted R-squared: 0.9755
F-statistic: 444.3 on 15 and 152 DF, p-value: < 2.2e-16

(ii)
The seasonal indices are as below:

exp.seas4.

<dbl>

(Intercept) 0.9446919

fMonth2 0.9317060

fMonth3 1.0161892

fMonth4 0.9504526

fMonth5 0.9980006

fMonth6 0.9844444

fMonth7 0.9931347

fMonth8 1.0236857

fMonth9 0.9850024

fMonth10 0.9996818

fMonth11 1.0747694

fMonth12 1.1127939
(iii)
Residuals are analyzed using the qq-plot, the Shapiro-Wilk test, and a simple plot:
Shapiro-Wilk normality test

data: resid(modelq4)
W = 0.99295, p-value = 0.5905
Although the Shapiro-Wilk suggests that we cannot reject normality, the plots of residuals
suggest that there might be some uncaptured trend. The ACF plot suggests that there might be
correlations between errors up to lag 12. Thus, an autoregressive model might need to be
explored.

(iv)
After adding lag-1 residuals, the model looks as follows. The lagged residual is significant, as
are the calendar trigonometric pairs. The R-Squared has also improved from 0.975 to 0.986,
indicating a better fit.

Call:
lm(formula = logSales ~ poly(Time, 2) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4)

Residuals:
Min 1Q Median 3Q Max
-0.072362 -0.014521 0.003148 0.013160 0.069542

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.151136 0.006552 1244.125 < 2e-16 ***
poly(Time, 2)1 2.463449 0.023809 103.466 < 2e-16 ***
poly(Time, 2)2 -0.296226 0.023919 -12.385 < 2e-16 ***
fMonth2 -0.015781 0.009106 -1.733 0.085168 .
fMonth3 0.071715 0.009105 7.877 6.65e-13 ***
fMonth4 0.004953 0.009082 0.545 0.586306
fMonth5 0.052845 0.009108 5.802 3.84e-08 ***
fMonth6 0.040734 0.009091 4.481 1.48e-05 ***
fMonth7 0.047605 0.009097 5.233 5.60e-07 ***
fMonth8 0.079809 0.009091 8.779 3.73e-15 ***
fMonth9 0.039740 0.009108 4.363 2.39e-05 ***
fMonth10 0.055450 0.009082 6.106 8.59e-09 ***
fMonth11 0.127731 0.009105 14.029 < 2e-16 ***
fMonth12 0.161846 0.009106 17.774 < 2e-16 ***
c348 -0.009931 0.002596 -3.826 0.000192 ***
s348 0.001959 0.002604 0.753 0.452917
c432 0.001502 0.002617 0.574 0.566848
s432 0.007665 0.002595 2.954 0.003647 **
lresid 0.678298 0.060439 11.223 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02354 on 148 degrees of freedom


(1 observation deleted due to missingness)
Multiple R-squared: 0.9878, Adjusted R-squared: 0.9863
F-statistic: 664.3 on 18 and 148 DF, p-value: < 2.2e-16
(v)
The residuals seem to be much more randomly distributed. The qq-plot looks better, except for
a few outliers that are causing an illusion of deviation from normality (as indicated by the
Shapiro-Wilk test). These outliers are likely during the business contraction cycle of 2000. The
ACF also indicates that most of the trend has been captured, since no significant correlations
are observed at any lag. It’s worth noting that there is a barely significant correlation at lag 4,
which could potentially indicate something that might yet to be captured.
Shapiro-Wilk normality test

data: resid(modelq4_ar)
W = 0.97628, p-value = 0.005783

Part b)
There is a significant drop at the beginning due to the global recession. We’ll use a degree 3
polynomial to model the two turns (the peak at the beginning, and the drop during the
recession).

Call:
lm(formula = logSales ~ poly(Time, 3) + fMonth + c348 + s348,
data = furniture_q4b)

Residuals:
Min 1Q Median 3Q Max
-0.112934 -0.025478 -0.000323 0.024882 0.119608

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.360440 0.012107 690.535 < 2e-16 ***
poly(Time, 3)1 0.719853 0.045266 15.903 < 2e-16 ***
poly(Time, 3)2 1.052190 0.045160 23.299 < 2e-16 ***
poly(Time, 3)3 -0.618278 0.045417 -13.613 < 2e-16 ***
fMonth2 0.024465 0.017107 1.430 0.15475
fMonth3 0.113133 0.017102 6.615 6.06e-10 ***
fMonth4 0.011850 0.017071 0.694 0.48864
fMonth5 0.080950 0.017120 4.728 5.15e-06 ***
fMonth6 0.049626 0.017106 2.901 0.00427 **
fMonth7 0.074645 0.017087 4.368 2.31e-05 ***
fMonth8 0.115421 0.017140 6.734 3.24e-10 ***
fMonth9 0.085334 0.017118 4.985 1.68e-06 ***
fMonth10 0.043528 0.017115 2.543 0.01199 *
fMonth11 0.106062 0.017168 6.178 5.76e-09 ***
fMonth12 0.107858 0.017141 6.293 3.21e-09 ***
c348 -0.010251 0.004972 -2.062 0.04096 *
s348 0.003054 0.004972 0.614 0.53997
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.04515 on 151 degrees of freedom


Multiple R-squared: 0.8805, Adjusted R-squared: 0.8679
F-statistic: 69.57 on 16 and 151 DF, p-value: < 2.2e-16

The second trigonometric calendar pair was found to be insignificant, but the rest were
significant.

The seasonal static indices are as follows:

exp.seas4b.

<dbl>

(Intercept) 0.9345041

fMonth2 0.9576485

fMonth3 1.0464399

fMonth4 0.9456439

fMonth5 1.0132981

fMonth6 0.9820496

fMonth7 1.0069294

fMonth8 1.0488371

fMonth9 1.0177506

fMonth10 0.9760790

fMonth11 1.0390660

fMonth12 1.0409347
Let’s now analyze the residuals to detect any uncaptured trends:
Shapiro-Wilk normality test

data: resid(modelq4b)
W = 0.99136, p-value = 0.4069
All the residual plots indicate that the residuals are not normally distributed and there is some
uncaptured trend. The ACF plot indicates that there might be correlated errors. So let’s fit an AR
model to see if that can be captured.

Call:
lm(formula = logSales ~ poly(Time, 3) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4b)

Residuals:
Min 1Q Median 3Q Max
-0.09651 -0.01455 0.00191 0.01694 0.07983

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.364599 0.007741 1080.505 < 2e-16 ***
poly(Time, 3)1 0.705535 0.028148 25.065 < 2e-16 ***
poly(Time, 3)2 1.065349 0.028279 37.673 < 2e-16 ***
poly(Time, 3)3 -0.639668 0.028552 -22.404 < 2e-16 ***
fMonth2 0.021057 0.010764 1.956 0.052344 .
fMonth3 0.108218 0.010762 10.056 < 2e-16 ***
fMonth4 0.007241 0.010733 0.675 0.500945
fMonth5 0.077646 0.010763 7.214 2.66e-11 ***
fMonth6 0.044051 0.010742 4.101 6.78e-05 ***
fMonth7 0.071987 0.010748 6.697 4.20e-10 ***
fMonth8 0.110117 0.010742 10.252 < 2e-16 ***
fMonth9 0.082282 0.010763 7.645 2.50e-12 ***
fMonth10 0.039107 0.010733 3.644 0.000372 ***
fMonth11 0.102304 0.010762 9.506 < 2e-16 ***
fMonth12 0.104553 0.010764 9.713 < 2e-16 ***
c348 -0.010039 0.003064 -3.276 0.001314 **
s348 0.001830 0.003080 0.594 0.553306
c432 -0.001875 0.003087 -0.607 0.544650
s432 0.011285 0.003072 3.674 0.000334 ***
lresid 0.775606 0.050477 15.366 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02781 on 147 degrees of freedom


(1 observation deleted due to missingness)
Multiple R-squared: 0.9558, Adjusted R-squared: 0.9501
F-statistic: 167.4 on 19 and 147 DF, p-value: < 2.2e-16

The R-squared value has jumped significantly from 0.87 to 0.95. Both the calendar trigonometric
pairs have also become significant, and so has the lagged residual. This indicates a better fit.
Let’s assess the residuals now:
Shapiro-Wilk normality test

data: resid(modelq4b_ar)
W = 0.98816, p-value = 0.1736
The residuals look much better now! All plots indicate that most of the trend has been captured.
However, there is still some marginal correlation in residuals at lags 6, 12 - this might indicate
some uncaptured dynamic seasonality.

Part c)

Part 2 Part 4a Part 4b

S1 0.9411000 0.9446919 0.9345041

S2 0.9557237 0.9317060 0.9576485

S3 1.0403769 1.0161892 1.0464399

S4 0.9457751 0.9504526 0.9456439

S5 1.0080673 0.9980006 1.0132981

S6 0.9820409 0.9844444 0.9820496

S7 1.0033942 0.9931347 1.0069294

S8 1.0438746 1.0236857 1.0488371


S9 1.0033892 0.9850024 1.0177506

S10 0.9796698 0.9996818 0.9760790

S11 1.0494562 1.0747694 1.0390660

S12 1.0563330 1.1127939 1.0409347

It can be seen that the seasonal indices for part 2 always lie between the values of part 4a and
part 4b. This makes sense since part 2 gives more of an ‘average’ seasonal index since it
models data from 1992 to 2019, whereas 4a and 4b model the first and second halves of the
duration.
Also, indices in part 4b seem to be more volatile. This might indicate that the seasonality has
changed with time, becoming more volatile as we approach 2019. This supports the observation
from part 4b that there might be some unobserved dynamic seasonality.

You might also like