Professional Documents
Culture Documents
Both curves (Sales and logSales) have trends that closely mirror the economic downturns,
suggesting that furniture sales are correlated with macroeconomic situation. The number of
turns (marked in red circles) are 4, suggesting a trend that can be modeled with a polynomial of
degree 5.
Seasonality of period 12 is evident in both plots, with furniture sales consistently peaking during
December and sinking during January/February each year.
Also observed is a huge drop in Sales in the first half of 2020, which coincides with the onset of
COVID-19. This is expected as the uncertainty and closing down of shops caused people to
delay their purchases, as observed in the higher than normal peaks in the months that followed.
As far as the choice of an additive or multiplicative model is concerned, the variance of errors
around the trend line doesn’t seem to be changing with time. This implies that an additive model
might suffice to model the time series. You can also see in the logSales plot that seasonal
variation decreases across time, which means that the multiplicative model is overcorrecting for
the variance as time progresses.
Question 2:
Part a)
Call:
lm(formula = logSales ~ poly(Time, 5) + fMonth + c348 + s348,
data = furniture[97:336, ])
Residuals:
Min 1Q Median 3Q Max
-0.142912 -0.040866 0.005692 0.041562 0.119576
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.356699 0.012631 661.608 < 2e-16 ***
poly(Time, 5)1 0.695932 0.056410 12.337 < 2e-16 ***
poly(Time, 5)2 0.626769 0.056345 11.124 < 2e-16 ***
poly(Time, 5)3 0.729037 0.056504 12.902 < 2e-16 ***
poly(Time, 5)4 -0.469960 0.056353 -8.340 8.01e-15 ***
poly(Time, 5)5 -0.597400 0.056591 -10.556 < 2e-16 ***
fMonth2 0.015419 0.017838 0.864 0.388291
fMonth3 0.100289 0.017836 5.623 5.64e-08 ***
fMonth4 0.004955 0.017822 0.278 0.781232
fMonth5 0.068741 0.017849 3.851 0.000154 ***
fMonth6 0.042584 0.017844 2.386 0.017858 *
fMonth7 0.064094 0.017838 3.593 0.000403 ***
fMonth8 0.103645 0.017869 5.800 2.27e-08 ***
fMonth9 0.064089 0.017862 3.588 0.000410 ***
fMonth10 0.040166 0.017864 2.248 0.025536 *
fMonth11 0.108978 0.017897 6.089 4.97e-09 ***
fMonth12 0.115509 0.017889 6.457 6.69e-10 ***
c348 -0.009171 0.005165 -1.776 0.077181 .
s348 0.001928 0.005169 0.373 0.709512
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The model is a multiplicative model with a 5th degree polynomial trend and static seasonal
indices. It also accounts for calendar structure, with the cos/sin pair of frequency 0.348 cycles
per month. The other cos/sin pair with 0.432 frequency turned out to be insignificant.
Part b)
Part c)
The qq plot is not hugging the straight line, indicating deviation from normality for the residuals.
Let’s test this using the Shapiro-Wilk test.
data: resid(modelq2)
W = 0.98378, p-value = 0.007725
The Shapiro-Wilk test indeed confirms lack of normality, since the p-value for a null hypothesis
of normality is significant. Let’s now plot the residuals across time:
The residuals do seem to indicate some remaining trend that hasn’t been captured. In an ideal
scenario, the residuals should be randomly fluctuating across time.
The ACF plot also confirms the presence of uncaptured trend. Ideally, only the lag 0 ACF should
be significant and the remaining insignificant. However, in this case there seems to be cosine
like trend in the ACF structure, indicating that there is trend remaining to be captured.
This is likely due to the actual data containing some dynamic seasonality. Our model is not able
to capture that sufficiently since we have used static estimates.
Plotting predicted against actual values, we see that the model is unable to capture the
changing amplitude of seasonal structure during the depression of 2008 and hence is thrown
off.
Question 3:
Part a)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.358e+00 6.782e-03 1232.505 < 2e-16 ***
poly(Time, 5)1 6.998e-01 2.973e-02 23.541 < 2e-16 ***
poly(Time, 5)2 6.334e-01 2.985e-02 21.223 < 2e-16 ***
poly(Time, 5)3 7.351e-01 3.004e-02 24.476 < 2e-16 ***
poly(Time, 5)4 -4.609e-01 3.009e-02 -15.315 < 2e-16 ***
poly(Time, 5)5 -5.894e-01 3.029e-02 -19.457 < 2e-16 ***
fMonth2 1.424e-02 9.475e-03 1.503 0.134412
fMonth3 9.854e-02 9.473e-03 10.402 < 2e-16 ***
fMonth4 3.009e-03 9.458e-03 0.318 0.750646
fMonth5 6.765e-02 9.470e-03 7.144 1.36e-11 ***
fMonth6 4.000e-02 9.461e-03 4.228 3.47e-05 ***
fMonth7 6.331e-02 9.462e-03 6.691 1.86e-10 ***
fMonth8 1.010e-01 9.462e-03 10.674 < 2e-16 ***
fMonth9 6.290e-02 9.470e-03 6.642 2.45e-10 ***
fMonth10 3.810e-02 9.458e-03 4.029 7.75e-05 ***
fMonth11 1.069e-01 9.473e-03 11.286 < 2e-16 ***
fMonth12 1.142e-01 9.475e-03 12.049 < 2e-16 ***
c348 -9.410e-03 2.703e-03 -3.481 0.000603 ***
s348 2.212e-03 2.716e-03 0.815 0.416227
c432 4.937e-05 2.721e-03 0.018 0.985539
s432 1.220e-02 2.701e-03 4.515 1.04e-05 ***
lresid 8.554e-01 3.541e-02 24.158 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This model has significantly improved R-squared value, compared to the first model. The lag 1
residuals have gone a long way towards capturing some trend. It is also noticed that both the
calendar structure variables are significant now.
Part b)
The residuals for this model look to be much more randomly normally distributed, indicating that
a lot of trend has been captured. Let’s verify this more thoroughly using the qq plot and the
Shapiro-Wilk test.
data: resid(modelq3)
W = 0.99558, p-value = 0.7277
The qq-plot and Shapiro-Wilk test confirm normality of residuals. The residuals stick to the
straight line in the qq-plot, and the p-value indicates insignificance in the case of the
Shapiro-Wilk test (can’t rule out the null hypothesis that the distribution is normal).
Now let’s examine the ACF plot to see if there is any uncaptured trend/seasonality.
There are spikes in the ACF plot at lags 6, 12, 18, 24, and so on. This aligns well with a half
yearly cycle and indicates that there might be some dynamic seasonality at play which the
system was unable to capture.
Part c)
S1 0.9411000 0.9425635
S2 0.9557237 0.9560780
S3 1.0403769 1.0401755
S4 0.9457751 0.9454042
S5 1.0080673 1.0085356
S6 0.9820409 0.9810335
S7 1.0033942 1.0041643
S8 1.0438746 1.0427258
S9 1.0033892 1.0037549
Question 4:
Part a)
(i)
Data for 1992 to 2006 is plotted below. Only 2 turns are observed, so I try to fit a model of
polynomial degree 3 in addition to the seasonal components.
After eliminating insignificant variables, the final model is as follows:
Call:
lm(formula = logSales ~ poly(Time, 2) + fMonth + c348 + s348,
data = furniture_q4)
Residuals:
Min 1Q Median 3Q Max
-0.084872 -0.018746 0.000844 0.018414 0.106006
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.149736 0.008516 957.028 < 2e-16 ***
poly(Time, 2)1 2.462746 0.031877 77.257 < 2e-16 ***
poly(Time, 2)2 -0.297885 0.031804 -9.366 < 2e-16 ***
fMonth2 -0.013842 0.012047 -1.149 0.252389
fMonth3 0.072956 0.012043 6.058 1.04e-08 ***
fMonth4 0.006079 0.012020 0.506 0.613735
fMonth5 0.054895 0.012053 4.555 1.07e-05 ***
fMonth6 0.041219 0.012040 3.423 0.000795 ***
fMonth7 0.050007 0.012025 4.159 5.33e-05 ***
fMonth8 0.080306 0.012059 6.660 4.72e-10 ***
fMonth9 0.041785 0.012039 3.471 0.000676 ***
fMonth10 0.056578 0.012033 4.702 5.73e-06 ***
fMonth11 0.129003 0.012066 10.692 < 2e-16 ***
fMonth12 0.163770 0.012041 13.601 < 2e-16 ***
c348 -0.009896 0.003502 -2.826 0.005350 **
s348 0.002055 0.003502 0.587 0.558232
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0318 on 152 degrees of freedom
Multiple R-squared: 0.9777, Adjusted R-squared: 0.9755
F-statistic: 444.3 on 15 and 152 DF, p-value: < 2.2e-16
(ii)
The seasonal indices are as below:
exp.seas4.
<dbl>
(Intercept) 0.9446919
fMonth2 0.9317060
fMonth3 1.0161892
fMonth4 0.9504526
fMonth5 0.9980006
fMonth6 0.9844444
fMonth7 0.9931347
fMonth8 1.0236857
fMonth9 0.9850024
fMonth10 0.9996818
fMonth11 1.0747694
fMonth12 1.1127939
(iii)
Residuals are analyzed using the qq-plot, the Shapiro-Wilk test, and a simple plot:
Shapiro-Wilk normality test
data: resid(modelq4)
W = 0.99295, p-value = 0.5905
Although the Shapiro-Wilk suggests that we cannot reject normality, the plots of residuals
suggest that there might be some uncaptured trend. The ACF plot suggests that there might be
correlations between errors up to lag 12. Thus, an autoregressive model might need to be
explored.
(iv)
After adding lag-1 residuals, the model looks as follows. The lagged residual is significant, as
are the calendar trigonometric pairs. The R-Squared has also improved from 0.975 to 0.986,
indicating a better fit.
Call:
lm(formula = logSales ~ poly(Time, 2) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4)
Residuals:
Min 1Q Median 3Q Max
-0.072362 -0.014521 0.003148 0.013160 0.069542
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.151136 0.006552 1244.125 < 2e-16 ***
poly(Time, 2)1 2.463449 0.023809 103.466 < 2e-16 ***
poly(Time, 2)2 -0.296226 0.023919 -12.385 < 2e-16 ***
fMonth2 -0.015781 0.009106 -1.733 0.085168 .
fMonth3 0.071715 0.009105 7.877 6.65e-13 ***
fMonth4 0.004953 0.009082 0.545 0.586306
fMonth5 0.052845 0.009108 5.802 3.84e-08 ***
fMonth6 0.040734 0.009091 4.481 1.48e-05 ***
fMonth7 0.047605 0.009097 5.233 5.60e-07 ***
fMonth8 0.079809 0.009091 8.779 3.73e-15 ***
fMonth9 0.039740 0.009108 4.363 2.39e-05 ***
fMonth10 0.055450 0.009082 6.106 8.59e-09 ***
fMonth11 0.127731 0.009105 14.029 < 2e-16 ***
fMonth12 0.161846 0.009106 17.774 < 2e-16 ***
c348 -0.009931 0.002596 -3.826 0.000192 ***
s348 0.001959 0.002604 0.753 0.452917
c432 0.001502 0.002617 0.574 0.566848
s432 0.007665 0.002595 2.954 0.003647 **
lresid 0.678298 0.060439 11.223 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
data: resid(modelq4_ar)
W = 0.97628, p-value = 0.005783
Part b)
There is a significant drop at the beginning due to the global recession. We’ll use a degree 3
polynomial to model the two turns (the peak at the beginning, and the drop during the
recession).
Call:
lm(formula = logSales ~ poly(Time, 3) + fMonth + c348 + s348,
data = furniture_q4b)
Residuals:
Min 1Q Median 3Q Max
-0.112934 -0.025478 -0.000323 0.024882 0.119608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.360440 0.012107 690.535 < 2e-16 ***
poly(Time, 3)1 0.719853 0.045266 15.903 < 2e-16 ***
poly(Time, 3)2 1.052190 0.045160 23.299 < 2e-16 ***
poly(Time, 3)3 -0.618278 0.045417 -13.613 < 2e-16 ***
fMonth2 0.024465 0.017107 1.430 0.15475
fMonth3 0.113133 0.017102 6.615 6.06e-10 ***
fMonth4 0.011850 0.017071 0.694 0.48864
fMonth5 0.080950 0.017120 4.728 5.15e-06 ***
fMonth6 0.049626 0.017106 2.901 0.00427 **
fMonth7 0.074645 0.017087 4.368 2.31e-05 ***
fMonth8 0.115421 0.017140 6.734 3.24e-10 ***
fMonth9 0.085334 0.017118 4.985 1.68e-06 ***
fMonth10 0.043528 0.017115 2.543 0.01199 *
fMonth11 0.106062 0.017168 6.178 5.76e-09 ***
fMonth12 0.107858 0.017141 6.293 3.21e-09 ***
c348 -0.010251 0.004972 -2.062 0.04096 *
s348 0.003054 0.004972 0.614 0.53997
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The second trigonometric calendar pair was found to be insignificant, but the rest were
significant.
exp.seas4b.
<dbl>
(Intercept) 0.9345041
fMonth2 0.9576485
fMonth3 1.0464399
fMonth4 0.9456439
fMonth5 1.0132981
fMonth6 0.9820496
fMonth7 1.0069294
fMonth8 1.0488371
fMonth9 1.0177506
fMonth10 0.9760790
fMonth11 1.0390660
fMonth12 1.0409347
Let’s now analyze the residuals to detect any uncaptured trends:
Shapiro-Wilk normality test
data: resid(modelq4b)
W = 0.99136, p-value = 0.4069
All the residual plots indicate that the residuals are not normally distributed and there is some
uncaptured trend. The ACF plot indicates that there might be correlated errors. So let’s fit an AR
model to see if that can be captured.
Call:
lm(formula = logSales ~ poly(Time, 3) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4b)
Residuals:
Min 1Q Median 3Q Max
-0.09651 -0.01455 0.00191 0.01694 0.07983
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.364599 0.007741 1080.505 < 2e-16 ***
poly(Time, 3)1 0.705535 0.028148 25.065 < 2e-16 ***
poly(Time, 3)2 1.065349 0.028279 37.673 < 2e-16 ***
poly(Time, 3)3 -0.639668 0.028552 -22.404 < 2e-16 ***
fMonth2 0.021057 0.010764 1.956 0.052344 .
fMonth3 0.108218 0.010762 10.056 < 2e-16 ***
fMonth4 0.007241 0.010733 0.675 0.500945
fMonth5 0.077646 0.010763 7.214 2.66e-11 ***
fMonth6 0.044051 0.010742 4.101 6.78e-05 ***
fMonth7 0.071987 0.010748 6.697 4.20e-10 ***
fMonth8 0.110117 0.010742 10.252 < 2e-16 ***
fMonth9 0.082282 0.010763 7.645 2.50e-12 ***
fMonth10 0.039107 0.010733 3.644 0.000372 ***
fMonth11 0.102304 0.010762 9.506 < 2e-16 ***
fMonth12 0.104553 0.010764 9.713 < 2e-16 ***
c348 -0.010039 0.003064 -3.276 0.001314 **
s348 0.001830 0.003080 0.594 0.553306
c432 -0.001875 0.003087 -0.607 0.544650
s432 0.011285 0.003072 3.674 0.000334 ***
lresid 0.775606 0.050477 15.366 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The R-squared value has jumped significantly from 0.87 to 0.95. Both the calendar trigonometric
pairs have also become significant, and so has the lagged residual. This indicates a better fit.
Let’s assess the residuals now:
Shapiro-Wilk normality test
data: resid(modelq4b_ar)
W = 0.98816, p-value = 0.1736
The residuals look much better now! All plots indicate that most of the trend has been captured.
However, there is still some marginal correlation in residuals at lags 6, 12 - this might indicate
some uncaptured dynamic seasonality.
Part c)
It can be seen that the seasonal indices for part 2 always lie between the values of part 4a and
part 4b. This makes sense since part 2 gives more of an ‘average’ seasonal index since it
models data from 1992 to 2019, whereas 4a and 4b model the first and second halves of the
duration.
Also, indices in part 4b seem to be more volatile. This might indicate that the seasonality has
changed with time, becoming more volatile as we approach 2019. This supports the observation
from part 4b that there might be some unobserved dynamic seasonality.