Stat 5350 - HW 1 - Aje

Question 1:
Periods of economic downturn (plotted as red lines in the curves below):

- March 2001 to November 2001
- December 2007 to June 2009
- February 2020 to April 2020
Both curves (Sales and logSales) have trends that closely mirror the economic downturns,
suggesting that furniture sales are correlated with macroeconomic situation. The number of
turns (marked in red circles) are 4, suggesting a trend that can be modeled with a polynomial of
degree 5.
Seasonality of period 12 is evident in both plots, with furniture sales consistently peaking during
December and sinking during January/February each year.
Also observed is a huge drop in Sales in the first half of 2020, which coincides with the onset of
COVID-19. This is expected as the uncertainty and closing down of shops caused people to
delay their purchases, as observed in the higher than normal peaks in the months that followed.
As far as the choice of an additive or multiplicative model is concerned, the variance of errors
around the trend line doesn’t seem to be changing with time. This implies that an additive model
might suffice to model the time series. You can also see in the logSales plot that seasonal
variation decreases across time, which means that the multiplicative model is overcorrecting for
the variance as time progresses.
Question 2:
Part a)
Multiplicative model fit on years 2000 to 2019:
Call:
lm(formula = logSales ~ poly(Time, 5) + fMonth + c348 + s348,
data = furniture[97:336, ])
Residuals:
Min 1Q Median 3Q Max
-0.142912 -0.040866 0.005692 0.041562 0.119576
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.356699 0.012631 661.608 < 2e-16 ***
poly(Time, 5)1 0.695932 0.056410 12.337 < 2e-16 ***
poly(Time, 5)2 0.626769 0.056345 11.124 < 2e-16 ***
poly(Time, 5)3 0.729037 0.056504 12.902 < 2e-16 ***
poly(Time, 5)4 -0.469960 0.056353 -8.340 8.01e-15 ***
poly(Time, 5)5 -0.597400 0.056591 -10.556 < 2e-16 ***
fMonth2 0.015419 0.017838 0.864 0.388291
fMonth3 0.100289 0.017836 5.623 5.64e-08 ***
fMonth4 0.004955 0.017822 0.278 0.781232
fMonth5 0.068741 0.017849 3.851 0.000154 ***
fMonth6 0.042584 0.017844 2.386 0.017858 *
fMonth7 0.064094 0.017838 3.593 0.000403 ***
fMonth8 0.103645 0.017869 5.800 2.27e-08 ***
fMonth9 0.064089 0.017862 3.588 0.000410 ***
fMonth10 0.040166 0.017864 2.248 0.025536 *
fMonth11 0.108978 0.017897 6.089 4.97e-09 ***
fMonth12 0.115509 0.017889 6.457 6.69e-10 ***
c348 -0.009171 0.005165 -1.776 0.077181 .
s348 0.001928 0.005169 0.373 0.709512
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05634 on 221 degrees of freedom

Multiple R-squared: 0.7738, Adjusted R-squared: 0.7554
F-statistic: 42 on 18 and 221 DF, p-value: < 2.2e-16
The model is a multiplicative model with a 5th degree polynomial trend and static seasonal
indices. It also accounts for calendar structure, with the cos/sin pair of frequency 0.348 cycles
per month. The other cos/sin pair with 0.432 frequency turned out to be insignificant.
Part b)
fmonth Seasonal Index
Jan -0.060705879 0.9411000
Feb -0.045286388 0.9557237
Mar 0.039583073 1.0403769
Apr -0.055750512 0.9457751
May 0.008034932 1.0080673
Jun -0.018122312 0.9820409
Jul 0.003388443 1.0033942
Aug 0.042939378 1.0438746
Sep 0.003383440 1.0033892
Oct -0.020539717 0.9796698
Nov 0.048272102 1.0494562
Dec 0.054803440 1.0563330

The fmonth values need to be exponentiated in order to obtain the actual seasonal indices. For
eg, for January the seasonal value is 0.94, which implies that the Sales would be 6% below
trend.
As expected, January has the lowest seasonal value (~6% below trend) for sales, and
December has the highest (~5% above trend). Apparently, furniture sales pick up during holiday
season - perhaps due to discounts, or due to gift purchases.
Part c)
The qq plot is not hugging the straight line, indicating deviation from normality for the residuals.
Let’s test this using the Shapiro-Wilk test.
Shapiro-Wilk normality test
data: resid(modelq2)
W = 0.98378, p-value = 0.007725
The Shapiro-Wilk test indeed confirms lack of normality, since the p-value for a null hypothesis
of normality is significant. Let’s now plot the residuals across time:
The residuals do seem to indicate some remaining trend that hasn’t been captured. In an ideal
scenario, the residuals should be randomly fluctuating across time.
The ACF plot also confirms the presence of uncaptured trend. Ideally, only the lag 0 ACF should
be significant and the remaining insignificant. However, in this case there seems to be cosine
like trend in the ACF structure, indicating that there is trend remaining to be captured.
This is likely due to the actual data containing some dynamic seasonality. Our model is not able
to capture that sufficiently since we have used static estimates.
Plotting predicted against actual values, we see that the model is unable to capture the
changing amplitude of seasonal structure during the depression of 2008 and hence is thrown
off.
Question 3:
Part a)
Coefficients:
(Intercept) 8.358e+00 6.782e-03 1232.505 < 2e-16 ***
poly(Time, 5)1 6.998e-01 2.973e-02 23.541 < 2e-16 ***
poly(Time, 5)2 6.334e-01 2.985e-02 21.223 < 2e-16 ***
poly(Time, 5)3 7.351e-01 3.004e-02 24.476 < 2e-16 ***
poly(Time, 5)4 -4.609e-01 3.009e-02 -15.315 < 2e-16 ***
poly(Time, 5)5 -5.894e-01 3.029e-02 -19.457 < 2e-16 ***
fMonth2 1.424e-02 9.475e-03 1.503 0.134412
fMonth3 9.854e-02 9.473e-03 10.402 < 2e-16 ***
fMonth4 3.009e-03 9.458e-03 0.318 0.750646
fMonth5 6.765e-02 9.470e-03 7.144 1.36e-11 ***
fMonth6 4.000e-02 9.461e-03 4.228 3.47e-05 ***
fMonth7 6.331e-02 9.462e-03 6.691 1.86e-10 ***
fMonth8 1.010e-01 9.462e-03 10.674 < 2e-16 ***
fMonth9 6.290e-02 9.470e-03 6.642 2.45e-10 ***
fMonth10 3.810e-02 9.458e-03 4.029 7.75e-05 ***
fMonth11 1.069e-01 9.473e-03 11.286 < 2e-16 ***
fMonth12 1.142e-01 9.475e-03 12.049 < 2e-16 ***
c348 -9.410e-03 2.703e-03 -3.481 0.000603 ***
s348 2.212e-03 2.716e-03 0.815 0.416227
c432 4.937e-05 2.721e-03 0.018 0.985539
s432 1.220e-02 2.701e-03 4.515 1.04e-05 ***
lresid 8.554e-01 3.541e-02 24.158 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(1 observation deleted due to missingness)
F-statistic: 158.3 on 21 and 217 DF, p-value: < 2.2e-16
This model has significantly improved R-squared value, compared to the first model. The lag 1
residuals have gone a long way towards capturing some trend. It is also noticed that both the
calendar structure variables are significant now.
Part b)
The residuals for this model look to be much more randomly normally distributed, indicating that
a lot of trend has been captured. Let’s verify this more thoroughly using the qq plot and the
Shapiro-Wilk test.
W = 0.99558, p-value = 0.7277
The qq-plot and Shapiro-Wilk test confirm normality of residuals. The residuals stick to the
straight line in the qq-plot, and the p-value indicates insignificance in the case of the
Shapiro-Wilk test (can’t rule out the null hypothesis that the distribution is normal).
Now let’s examine the ACF plot to see if there is any uncaptured trend/seasonality.
There are spikes in the ACF plot at lags 6, 12, 18, 24, and so on. This aligns well with a half
yearly cycle and indicates that there might be some dynamic seasonality at play which the
system was unable to capture.
Part c)
Without With Lag 1

Lag 1 Residual
Residual
S1 0.9411000 0.9425635
S2 0.9557237 0.9560780
S3 1.0403769 1.0401755
S4 0.9457751 0.9454042
S5 1.0080673 1.0085356
S6 0.9820409 0.9810335
S7 1.0033942 1.0041643
S8 1.0438746 1.0427258
S9 1.0033892 1.0037549
S10 0.9796698 0.9791717
S11 1.0494562 1.0489220
S12 1.0563330 1.0565553

The seasonality estimates look very similar, indicating that seasonality was well captured even
without the lag-1 residuals. This implies that most of the improvement in the model is coming
from better capture of trend using the lag-1 residual.
Question 4:
Part a)
(i)
Data for 1992 to 2006 is plotted below. Only 2 turns are observed, so I try to fit a model of
polynomial degree 3 in addition to the seasonal components.
After eliminating insignificant variables, the final model is as follows:
Call:
data = furniture_q4)
Residuals:
-0.084872 -0.018746 0.000844 0.018414 0.106006
Coefficients:
(Intercept) 8.149736 0.008516 957.028 < 2e-16 ***
poly(Time, 2)1 2.462746 0.031877 77.257 < 2e-16 ***
poly(Time, 2)2 -0.297885 0.031804 -9.366 < 2e-16 ***
fMonth2 -0.013842 0.012047 -1.149 0.252389
fMonth3 0.072956 0.012043 6.058 1.04e-08 ***
fMonth4 0.006079 0.012020 0.506 0.613735
fMonth5 0.054895 0.012053 4.555 1.07e-05 ***
fMonth6 0.041219 0.012040 3.423 0.000795 ***
fMonth7 0.050007 0.012025 4.159 5.33e-05 ***
fMonth8 0.080306 0.012059 6.660 4.72e-10 ***
fMonth9 0.041785 0.012039 3.471 0.000676 ***
fMonth10 0.056578 0.012033 4.702 5.73e-06 ***
fMonth11 0.129003 0.012066 10.692 < 2e-16 ***
fMonth12 0.163770 0.012041 13.601 < 2e-16 ***
c348 -0.009896 0.003502 -2.826 0.005350 **
s348 0.002055 0.003502 0.587 0.558232
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(ii)
The seasonal indices are as below:
exp.seas4.
<dbl>
(Intercept) 0.9446919
fMonth2 0.9317060
fMonth3 1.0161892
fMonth4 0.9504526
fMonth5 0.9980006
fMonth6 0.9844444
fMonth7 0.9931347
fMonth8 1.0236857
fMonth9 0.9850024
fMonth10 0.9996818
fMonth11 1.0747694
fMonth12 1.1127939
(iii)
Residuals are analyzed using the qq-plot, the Shapiro-Wilk test, and a simple plot:
W = 0.99295, p-value = 0.5905
Although the Shapiro-Wilk suggests that we cannot reject normality, the plots of residuals
suggest that there might be some uncaptured trend. The ACF plot suggests that there might be
correlations between errors up to lag 12. Thus, an autoregressive model might need to be
explored.
(iv)
After adding lag-1 residuals, the model looks as follows. The lagged residual is significant, as
are the calendar trigonometric pairs. The R-Squared has also improved from 0.975 to 0.986,
indicating a better fit.
Call:
lm(formula = logSales ~ poly(Time, 2) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4)
Residuals:
-0.072362 -0.014521 0.003148 0.013160 0.069542
Coefficients:
(Intercept) 8.151136 0.006552 1244.125 < 2e-16 ***
poly(Time, 2)1 2.463449 0.023809 103.466 < 2e-16 ***
poly(Time, 2)2 -0.296226 0.023919 -12.385 < 2e-16 ***
fMonth2 -0.015781 0.009106 -1.733 0.085168 .
fMonth3 0.071715 0.009105 7.877 6.65e-13 ***
fMonth4 0.004953 0.009082 0.545 0.586306
fMonth5 0.052845 0.009108 5.802 3.84e-08 ***
fMonth6 0.040734 0.009091 4.481 1.48e-05 ***
fMonth7 0.047605 0.009097 5.233 5.60e-07 ***
fMonth8 0.079809 0.009091 8.779 3.73e-15 ***
fMonth9 0.039740 0.009108 4.363 2.39e-05 ***
fMonth10 0.055450 0.009082 6.106 8.59e-09 ***
fMonth11 0.127731 0.009105 14.029 < 2e-16 ***
fMonth12 0.161846 0.009106 17.774 < 2e-16 ***
c348 -0.009931 0.002596 -3.826 0.000192 ***
s348 0.001959 0.002604 0.753 0.452917
c432 0.001502 0.002617 0.574 0.566848
s432 0.007665 0.002595 2.954 0.003647 **
lresid 0.678298 0.060439 11.223 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(v)
The residuals seem to be much more randomly distributed. The qq-plot looks better, except for
a few outliers that are causing an illusion of deviation from normality (as indicated by the
Shapiro-Wilk test). These outliers are likely during the business contraction cycle of 2000. The
ACF also indicates that most of the trend has been captured, since no significant correlations
are observed at any lag. It’s worth noting that there is a barely significant correlation at lag 4,
which could potentially indicate something that might yet to be captured.
data: resid(modelq4_ar)
W = 0.97628, p-value = 0.005783
Part b)
There is a significant drop at the beginning due to the global recession. We’ll use a degree 3
polynomial to model the two turns (the peak at the beginning, and the drop during the
recession).
Call:
data = furniture_q4b)
Residuals:
-0.112934 -0.025478 -0.000323 0.024882 0.119608
Coefficients:
(Intercept) 8.360440 0.012107 690.535 < 2e-16 ***
poly(Time, 3)1 0.719853 0.045266 15.903 < 2e-16 ***
poly(Time, 3)2 1.052190 0.045160 23.299 < 2e-16 ***
poly(Time, 3)3 -0.618278 0.045417 -13.613 < 2e-16 ***
fMonth2 0.024465 0.017107 1.430 0.15475
fMonth3 0.113133 0.017102 6.615 6.06e-10 ***
fMonth4 0.011850 0.017071 0.694 0.48864
fMonth5 0.080950 0.017120 4.728 5.15e-06 ***
fMonth6 0.049626 0.017106 2.901 0.00427 **
fMonth7 0.074645 0.017087 4.368 2.31e-05 ***
fMonth8 0.115421 0.017140 6.734 3.24e-10 ***
fMonth9 0.085334 0.017118 4.985 1.68e-06 ***
fMonth10 0.043528 0.017115 2.543 0.01199 *
fMonth11 0.106062 0.017168 6.178 5.76e-09 ***
fMonth12 0.107858 0.017141 6.293 3.21e-09 ***
c348 -0.010251 0.004972 -2.062 0.04096 *
s348 0.003054 0.004972 0.614 0.53997
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The second trigonometric calendar pair was found to be insignificant, but the rest were
significant.
The seasonal static indices are as follows:
exp.seas4b.
<dbl>
(Intercept) 0.9345041
fMonth2 0.9576485
fMonth3 1.0464399
fMonth4 0.9456439
fMonth5 1.0132981
fMonth6 0.9820496
fMonth7 1.0069294
fMonth8 1.0488371
fMonth9 1.0177506
fMonth10 0.9760790
fMonth11 1.0390660
fMonth12 1.0409347
Let’s now analyze the residuals to detect any uncaptured trends:
data: resid(modelq4b)
W = 0.99136, p-value = 0.4069
All the residual plots indicate that the residuals are not normally distributed and there is some
uncaptured trend. The ACF plot indicates that there might be correlated errors. So let’s fit an AR
model to see if that can be captured.
Call:
lm(formula = logSales ~ poly(Time, 3) + fMonth + c348 + s348 +
c432 + s432 + lresid, data = furniture_mini_q4b)
Residuals:
-0.09651 -0.01455 0.00191 0.01694 0.07983
Coefficients:
(Intercept) 8.364599 0.007741 1080.505 < 2e-16 ***
poly(Time, 3)1 0.705535 0.028148 25.065 < 2e-16 ***
poly(Time, 3)2 1.065349 0.028279 37.673 < 2e-16 ***
poly(Time, 3)3 -0.639668 0.028552 -22.404 < 2e-16 ***
fMonth2 0.021057 0.010764 1.956 0.052344 .
fMonth3 0.108218 0.010762 10.056 < 2e-16 ***
fMonth4 0.007241 0.010733 0.675 0.500945
fMonth5 0.077646 0.010763 7.214 2.66e-11 ***
fMonth6 0.044051 0.010742 4.101 6.78e-05 ***
fMonth7 0.071987 0.010748 6.697 4.20e-10 ***
fMonth8 0.110117 0.010742 10.252 < 2e-16 ***
fMonth9 0.082282 0.010763 7.645 2.50e-12 ***
fMonth10 0.039107 0.010733 3.644 0.000372 ***
fMonth11 0.102304 0.010762 9.506 < 2e-16 ***
fMonth12 0.104553 0.010764 9.713 < 2e-16 ***
c348 -0.010039 0.003064 -3.276 0.001314 **
s348 0.001830 0.003080 0.594 0.553306
c432 -0.001875 0.003087 -0.607 0.544650
s432 0.011285 0.003072 3.674 0.000334 ***
lresid 0.775606 0.050477 15.366 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The R-squared value has jumped significantly from 0.87 to 0.95. Both the calendar trigonometric
pairs have also become significant, and so has the lagged residual. This indicates a better fit.
Let’s assess the residuals now:
data: resid(modelq4b_ar)
W = 0.98816, p-value = 0.1736
The residuals look much better now! All plots indicate that most of the trend has been captured.
However, there is still some marginal correlation in residuals at lags 6, 12 - this might indicate
some uncaptured dynamic seasonality.
Part c)
Part 2 Part 4a Part 4b
S1 0.9411000 0.9446919 0.9345041
S2 0.9557237 0.9317060 0.9576485
S3 1.0403769 1.0161892 1.0464399
S4 0.9457751 0.9504526 0.9456439
S5 1.0080673 0.9980006 1.0132981
S6 0.9820409 0.9844444 0.9820496
S7 1.0033942 0.9931347 1.0069294
S8 1.0438746 1.0236857 1.0488371

S9 1.0033892 0.9850024 1.0177506
S10 0.9796698 0.9996818 0.9760790
S11 1.0494562 1.0747694 1.0390660
S12 1.0563330 1.1127939 1.0409347
It can be seen that the seasonal indices for part 2 always lie between the values of part 4a and
part 4b. This makes sense since part 2 gives more of an ‘average’ seasonal index since it
models data from 1992 to 2019, whereas 4a and 4b model the first and second halves of the
duration.
Also, indices in part 4b seem to be more volatile. This might indicate that the seasonality has
changed with time, becoming more volatile as we approach 2019. This supports the observation
from part 4b that there might be some unobserved dynamic seasonality.

Stat 5350 - HW 1 - Aje

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat 5350 - HW 1 - Aje

Uploaded by

Copyright:

Available Formats

Question 1:

Periods of economic downturn (plotted as red lines in the curves below):

Multiplicative model fit on years 2000 to 2019:

Residual standard error: 0.05634 on 221 degrees of freedom

fmonth Seasonal Index

Jan -0.060705879 0.9411000

Feb -0.045286388 0.9557237

Mar 0.039583073 1.0403769

Apr -0.055750512 0.9457751

May 0.008034932 1.0080673

Jun -0.018122312 0.9820409

Jul 0.003388443 1.0033942

Aug 0.042939378 1.0438746

Sep 0.003383440 1.0033892

Oct -0.020539717 0.9796698

Nov 0.048272102 1.0494562

Dec 0.054803440 1.0563330

Shapiro-Wilk normality test

Residual standard error: 0.02948 on 217 degrees of freedom

Shapiro-Wilk normality test

Without With Lag 1

S10 0.9796698 0.9791717

S11 1.0494562 1.0489220

S12 1.0563330 1.0565553

Residual standard error: 0.02354 on 148 degrees of freedom

Residual standard error: 0.04515 on 151 degrees of freedom

The seasonal static indices are as follows:

Residual standard error: 0.02781 on 147 degrees of freedom

Part 2 Part 4a Part 4b

S1 0.9411000 0.9446919 0.9345041

S2 0.9557237 0.9317060 0.9576485

S3 1.0403769 1.0161892 1.0464399

S4 0.9457751 0.9504526 0.9456439

S5 1.0080673 0.9980006 1.0132981

S6 0.9820409 0.9844444 0.9820496

S7 1.0033942 0.9931347 1.0069294

S8 1.0438746 1.0236857 1.0488371

S10 0.9796698 0.9996818 0.9760790

S11 1.0494562 1.0747694 1.0390660

S12 1.0563330 1.1127939 1.0409347

You might also like