You are on page 1of 34

3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.

2 Exploratory Data Analysis

2 Time Series Regression and


Exploratory Data Analysis 2.2
Exploratory Data Analysis
Aaron Smith
2022-11-24
This code is modified from Time Series Analysis and Its Applications, by Robert H. Shumway, David S. Stoffer
https://github.com/nickpoison/tsa4 (https://github.com/nickpoison/tsa4)

The most recent version of the package can be found at https://github.com/nickpoison/astsa/


(https://github.com/nickpoison/astsa/)

You can find demonstrations of astsa capabilities at


https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md
(https://github.com/nickpoison/astsa/blob/master/fun_with_astsa/fun_with_astsa.md)

In addition, the News and ChangeLog files are at https://github.com/nickpoison/astsa/blob/master/NEWS.md


(https://github.com/nickpoison/astsa/blob/master/NEWS.md).

The webpages for the texts and some help on using R for time series analysis can be found at
https://nickpoison.github.io/ (https://nickpoison.github.io/).

UCF students can download it for free through the library.

Punchline of this video:

if we have a trend stationary time series, we use detrending to get the stationary component
if we have a random walk time series, we use differencing to get a stationary time series

Our time series needs to be stationary for averaging the values over time to make sense.

We use sample autocorrelation to measure (estimate) the dependence of values between each other.

When we use autocorrelation, we are assuming that the dependence between values is constant over the time
interval.

stationarity in mean
stationarity in autocorrelation

Often, this is not the case.

The Johnson & Johnson series has a mean that increases exponentially over time, and the increase in the
magnitude of the fluctuations around this trend causes changes in the covariance function; the variance of the
process, for example, clearly increases as one progresses over the length of the series.

Johnson and Johnson Quarterly Earnings Per Share

Johnson and Johnson quarterly earnings per share, 84 quarters (21 years) measured from the first quarter of
1960 to the last quarter of 1980.

Note the gradually increasing underlying trend and the rather regular variation superimposed on the trend that
seems to repeat over quarters.

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 1/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

data(
list = "jj",
package = "astsa"
)
astsa::tsplot(
x = jj,
col = 4,
type="o",
ylab = "Quarterly Earnings per Share"
)

The global temperature series shown contains some evidence of a trend over time.

data(
list = "globtemp",
package = "astsa"
)
astsa::tsplot(
x = globtemp,
col = 4,
type = "o",
ylab = "Global Temperature Deviations"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 2/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

Trend stationary
Trend stationary model is the easiest form of nonstationarity to work with. It has stationary behavior around a
trend.

x t = μt + yt

μt is the trend

yt is a stationary process

Frequently we will estimate the trend, then find the stationary process by working with the residuals

ŷ t = x t − μ̂t

Example 2.4 Detrending Chicken Prices


Let’s use a trend stationary model

x t = μt + yt

μt = β 0 + β 1 t

load the data

data(
list = "chicken",
package = "astsa"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 3/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

lm(
formula = chicken ~ time(chicken)
)

##
## Call:
## lm(formula = chicken ~ time(chicken))
##
## Coefficients:
## (Intercept) time(chicken)
## -7131.022 3.592

plot the time series

astsa::tsplot(
x = chicken,
main = "original time series"
)

μ̂t = −7131 + 3.59t

ŷ t = x t + 7131 − 3.59t

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 4/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

plot(
x = time(chicken),
y = chicken - predict(
object = lm(
formula = chicken ~ time(chicken)
)
),
type = "l"
)

plot the detrended time series

# astsa now has a detrend script, so Figure 2.4 can be done as


#par(mfrow=2:1)
astsa::tsplot(
x = astsa::detrend(
series = chicken
),
main = "detrended"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 5/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

plot the difference between observations as a time series

astsa::tsplot(
x = diff(
x = chicken
),
main = "first difference"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 6/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

random walk with drift model,


μt = δ + μt−1 + wt

δ the drift

wt white noise

If x t is trend stationary, and the trend is a random walk with drift

x t − x t−1 =(μt + yt ) − (μt−1 + yt−1 )

=(μt − μt−1 ) + (yt − yt−1 )

=(δ + wt ) + (yt − yt−1 )

Since δ is constant, E(wt ) = 0 , and yt is stationary, the difference of consecutive observations has constant
expected value.

Let z_t =& y_{t} - y_{t-1} , then

γ z (h) =cov(zt+h , zt )

=cov(yt+h − yt+h−1 , yt − yt−1 )

=cov(yt+h , yt ) + cov(yt+h , yt−1 ) + cov(yt+h−1 , yt ) + cov(yt+h−1 , yt−1 )

=γ y (h) + γ y (h + 1) + γ y (h − 1) + γ y (h)

=γ y (h + 1) + 2γ y (h) + γ y (h − 1)

this is independent of time

An advantage of differencing is that no parameter is estimated.

A disadvantage of differencing is that it does not provide an estimate of the stationary component yt .

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 7/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

Use differencing when you want a stationary time series from a non-stationary time series.

Use detrending if you want to estimate a stationary component yt .

If x t = μt + yt and μt = β0 + β1 t , then

x t − x t−1 =(μt + yt ) − (μt−1 + yt−1 )

=(β 0 + β 1 t + yt ) − (β 0 + β 1 (t − 1) + yt−1 )

=β 1 + yt − yt−1

differencing notation
▽x t = x t − x t−1

We use first differences to estimate a linear trend.

We use second differences to estimate a quadratic trend.

the backshift operator


Bx t = x t−1

2
B x t = B(Bx t ) = B(x t−1 ) = x t−2

k
B x t = x t−k

−1 −1 −1
B Bx t = x t = BB x t (B is the forward shift operator)

0
B xt = xt

0
▽x t = ( B − B)x t

2 0 2 0 2
▽ x t = (B − B) x t = ( B − 2B + B )x t = x t − 2x t−1 + x t−2

2
▽ x t = ▽(x t − x t−1 ) = (x t − x t−1 ) − (x t − x t−2 ) = x t − 2x t−1 + x t−2
1

Definition 2.5 Differences of order d


d 0 d
▽ = (B − B)

The first difference is a a linear filter applied to eliminate a trend.

Other filters, formed by averaging values near x t , can produce adjusted series that eliminate other kinds of
unwanted fluctuations.

The differencing technique is an important component of the ARIMA model of Box and Jenkins.

Example 2.5 Differencing Chicken Prices


The first difference of the chicken prices series produces different results than removing trend by detrending via
regression.

The differenced series does not contain the long (five-year) cycle we observe in the detrended series.

The differenced series exhibits an annual cycle that was obscured in the original or detrended data.

plot the autocorrelation of the time series, detrended time series, and the differences

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 8/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

# and Figure 2.5 as


#dev.new()
#par(mfrow=c(3,1)) # plot ACFs
astsa::acf1(
series = chicken,
max.lag = 48,
main = "chicken"
)

## [1] 0.99 0.97 0.95 0.93 0.91 0.89 0.87 0.86 0.84 0.82 0.80 0.78 0.75 0.73 0.71
## [16] 0.68 0.66 0.63 0.61 0.59 0.57 0.55 0.53 0.50 0.48 0.46 0.44 0.43 0.41 0.40
## [31] 0.38 0.37 0.37 0.36 0.35 0.34 0.33 0.31 0.30 0.28 0.27 0.26 0.25 0.24 0.23
## [46] 0.22 0.21 0.20

astsa::acf1(
series = astsa::detrend(
series = chicken
),
max.lag = 48,
main = "detrended"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 9/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

## [1] 0.97 0.91 0.83 0.75 0.68 0.61 0.56 0.51 0.48 0.46 0.43 0.39
## [13] 0.33 0.26 0.20 0.14 0.08 0.03 0.00 -0.03 -0.04 -0.05 -0.07 -0.10
## [25] -0.13 -0.18 -0.21 -0.24 -0.25 -0.25 -0.23 -0.20 -0.16 -0.13 -0.11 -0.10
## [37] -0.11 -0.13 -0.14 -0.16 -0.17 -0.16 -0.15 -0.13 -0.10 -0.08 -0.05 -0.04

astsa::acf1(
series = diff(
x = chicken
),
max.lag = 48,
main = "first difference"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 10/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

## [1] 0.72 0.39 0.09 -0.07 -0.16 -0.20 -0.27 -0.23 -0.11 0.09 0.26 0.33
## [13] 0.20 0.07 -0.03 -0.10 -0.19 -0.25 -0.29 -0.20 -0.08 0.08 0.16 0.18
## [25] 0.08 -0.06 -0.21 -0.31 -0.40 -0.40 -0.33 -0.18 0.02 0.20 0.30 0.35
## [37] 0.26 0.13 -0.02 -0.14 -0.23 -0.21 -0.18 -0.11 -0.03 0.08 0.21 0.33

Example 2.6 Differencing Global Temperature


The global temperature series appears to behave more as a random walk than a trend stationary series.

Rather than detrend the data, it would be more appropriate to use differencing to coerce it into stationarity.

In this case it appears that the differenced process shows minimal autocorrelation, which may imply the global
temperature series is nearly a random walk with drift.

It is interesting to note that if the series is a random walk with drift, the mean of the differenced series, which is
an estimate of the drift, is about .008, or an increase of about one degree centigrade per 100 years.

load the data

data(
list = c("globtemp","gtemp"),
package = "astsa"
)

plot the time series

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 11/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::tsplot(
x = globtemp
)

astsa::tsplot(
x = gtemp
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 12/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

#par(mfrow=c(2,1))
astsa::tsplot(
x = diff(
x = globtemp
),
type = "o"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 13/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::tsplot(
x = diff(
x = gtemp
),
type = "o"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 14/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

mean(
x = diff(
x = globtemp
)
) # drift estimate = .008

## [1] 0.007925926

mean(
x = diff(
x = gtemp
)
) # drift estimate = .0066

## [1] 0.006589147

autocorrelation of the differences

astsa::acf1(
series = diff(
x = globtemp
),
max.lag = 48,
main = ""
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 15/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

## [1] -0.24 -0.19 -0.08 0.20 -0.15 -0.03 0.03 0.14 -0.16 0.11 -0.05 0.00
## [13] -0.13 0.14 -0.01 -0.08 0.00 0.19 -0.07 0.02 -0.02 0.08 -0.12 -0.07
## [25] 0.10 0.13 -0.15 -0.01 0.09 0.00 -0.09 0.07 -0.03 -0.13 0.06 -0.06
## [37] 0.09 0.01 0.09 -0.06 -0.12 0.00 0.13 -0.03 0.00 0.01 0.10 -0.06

astsa::acf1(
series = diff(
x = gtemp
),
max.lag = 48,
main = ""
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 16/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

## [1] -0.29 -0.16 -0.12 0.22 -0.15 0.02 0.03 0.11 -0.20 0.15 0.04 -0.07
## [13] -0.17 0.15 0.06 -0.08 0.00 0.14 -0.14 0.04 0.00 0.11 -0.13 -0.03
## [25] 0.08 0.10 -0.23 0.07 0.07 -0.01 -0.11 0.15 -0.05 -0.10 0.02 -0.03
## [37] 0.06 0.00 0.07 -0.05 -0.12 0.04 0.13 -0.03 -0.04 -0.01 0.11 -0.09

log-transformations
frequently, log-transformations of time series will equalize the variability over a length of time. Especially if
larger fluctuations tend to appear with larger observed values.

yt = log(x t )

Box-Cox transformation
Frequently we use the Box-Cox transformation to get a variable that looks more similar to normally distributed,
or to improve a variable as an input for another time series.

λ
⎧ x − 1
t
if λ ≠ 0
yt = ⎨
λ

log(x t ) if λ = 0

Example 2.7 Paleoclimatic Glacial Varves


Melting glaciers deposit yearly layers of sand and silt during the spring melting seasons, which can be
reconstructed yearly over a period ranging from the time deglaciation began in New England (about 12,600
years ago) to the time it ended (about 6,000 years ago). Such sedimentary deposits, called varves, can be

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 17/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

used as proxies for paleoclimatic parameters, such as temperature, because, in a warm year, more sand and
silt are deposited from the receding glacier.

The plot shows the thicknesses of the yearly varves collected from one location in Massachusetts for 634
years, beginning 11,834 years ago. For further information.

data(
list = "varve",
package = "astsa"
)

time series plot of the time series

#layout(matrix(1:4,2), widths=c(2.5,1))
astsa::tsplot(
x = varve,
main = "",
ylab = "",
col = 4
)
mtext(
text = "varve",
side = 3,
line = 0.5,
cex = 1.2,
font = 2,
adj = 0
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 18/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

Because the variation in thicknesses increases in proportion to the amount deposited, a logarithmic
transformation could remove the nonstationarity observable in the variance as a function of time. It is clear that
this improvement has occurred.

time series of the log-transform of the time series

astsa::tsplot(
x = log(varve),
main = "",
ylab = "",
col = 4
)
mtext(
text = "log(varve)",
side = 3,
line = 0.5,
cex = 1.2,
font = 2,
adj = 0
)

We may also plot the histogram of the original and transformed data to argue that the approximation to
normality is improved. The ordinary first differences. We note that the first differences have a

normal plots of the time series and the log-transformed time series

hist(
x = varve
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 19/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

qqnorm(
y = varve,
main = "",
col = 4
)
qqline(
y = varve,
col = 2,
lwd = 2
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 20/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

hist(
x = log(varve)
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 21/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

qqnorm(
y = log(varve),
main = "",
col = 4
)
qqline(
y = log(varve),
col = 2,
lwd = 2
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 22/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

Scatterplot matrices for lagged data


We use scatterplot matrices to visualize the relationship between and time series and its lags.

The autocorrelation function tells us whether a substantial linear relation exists between the series and its own
lagged values. The ACF gives a profile of the linear correlation at all possible lags and shows which values of h
lead to the best predictability.

The restriction of this idea to linear predictability, which ignores non-linear relationships between a time series
and its lags.

Example 2.8 Scatterplot Matrices, SOI and


Recruitment
To check for nonlinear relations of this form, it is convenient to display a lagged scatterplot matrix.

The sample autocorrelations are displayed in the upper right-hand corner and superimposed on the
scatterplots are locally weighted scatterplot smoothing (lowess) lines that can be used to help discover any
nonlinearities.

load the data

data(
list = c("soi","rec"),
package = "astsa"
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 23/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

We notice that lags 1, 12, 2, and 11 have the strongest correlations. SOI is over months so -12 corresponds to
the same month in the previous year.

lag plot on soi

astsa::lag1.plot(
series = soi,
max.lag = 12,
col = astsa::astsa.col(
col = 4,
alpha = 0.3
),
cex = 1.5,
pch = 20
)

In a previous video we established a relationship between SOI and the recruitment time series.

We see that there is a relationship between recruitment and SOI lagged by 5, 6, 7, 8.

The negative correlation signs indicate that increases (decreases) in SOI lead to decreases (increases) in
recruitment.

The curvative in the LOESS lines leads us to conjecture that different signs of SOI have different impacts on
recruitment.

lag plot of soi leading rec

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 24/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::lag2.plot(
series1 = soi,
series2 = rec,
max.lag = 8,
col = astsa::astsa.col(
col = 4,
alpha = 0.3
),
cex = 1.5,
pch = 20
)

Example 2.9 Regression with Lagged


Variables
R t = β 0 + β 1 St−6 + wt

Let’s expand this model with a dummy variable to incorporate the positive/negative findings for SOI

R t =β 0 + β 1 St−6 + β 2 Dt−6 + β 3 Dt−6 St−6 + wt

0 if St < 0
D t ={
1 if St ≥ 0

β 0 + β 1 St−6 + wt if St < 0
R t ={ 6

(β 0 + β 2 ) + (β 1 + β 3 )St−6 + wt if St ≥ 0
6

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 25/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

dummy = ifelse(
test = soi < 0,
yes = 0,
no = 1
)
fish = ts.intersect(
rec = rec,
soiL6 = lag(
x = soi,
k = -6
),
dL6 = lag(
x = dummy,
k = -6
),
dframe = TRUE
)
lm_fish <- lm(
formula = rec~ soiL6*dL6,
data = fish,
na.action = NULL
)
summary(
object = lm_fish
)

##
## Call:
## lm(formula = rec ~ soiL6 * dL6, data = fish, na.action = NULL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -63.291 -15.821 2.224 15.791 61.788
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.479 2.865 25.998 < 2e-16 ***
## soiL6 -15.358 7.401 -2.075 0.0386 *
## dL6 -1.139 3.711 -0.307 0.7590
## soiL6:dL6 -51.244 9.523 -5.381 1.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.84 on 443 degrees of freedom
## Multiple R-squared: 0.4024, Adjusted R-squared: 0.3984
## F-statistic: 99.43 on 3 and 443 DF, p-value: < 2.2e-16

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 26/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::tsplot(
x = fish$soiL6,
y = fish$rec,
type = 'p',
col = 4,
ylab = 'rec',
xlab = 'soiL6'
)
lines(
x = lowess(
x = fish$soiL6,
y = fish$rec
),
col = 4,
lwd = 2
)
points(
x = fish$soiL6,
y = fitted(
object = lm_fish
),
pch = '+',
col = 6
)

time series plot of the residuals, there is autocorrelation in the residuals

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 27/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::tsplot(
x = resid(
object = lm_fish
)
) # not shown ...

astsa::acf1(
series = resid(
object = lm_fish
)
) # ... but obviously not noise

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 28/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

## [1] 0.69 0.62 0.49 0.37 0.24 0.15 0.08 0.00 -0.03 -0.10 -0.13 -0.16
## [13] -0.17 -0.23 -0.24 -0.23 -0.23 -0.22 -0.17 -0.09 -0.05 0.01 0.05 0.06
## [25] 0.09 0.07 0.10 0.06 0.02 -0.02 -0.02 -0.02 -0.03 -0.02 0.00 0.01
## [37] -0.01 -0.04 -0.07 -0.05 -0.06 -0.03 -0.02 0.01 0.04 0.04 0.08 0.08

Example 2.10 Using Regression to Discover


a Signal in Noise
Frequently we can statistically capture periodic behavior without knowing the mathematical function of the
signal.

The trigonometric identities and the orthogonality of Fourier series enables regression to estimate periodic
signal.

cos(α + β) = cos(α)cos(β) − sin(α)sin(β)

3π 3π 3π
cos (2πx + ) = cos (2πx) cos ( ) − sin (2πx) sin ( )
5 5 5

3π 3π 3π
2cos (2πx + ) = 2cos ( ) cos (2πx) − 2sin ( ) sin (2πx)
5 5 5


2cos (2πx + ) ≈ −0.618034cos (2πx) − −1.902113sin (2πx)
5

true coefficients: − 0.618034, −1.902113

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 29/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

set.seed(
seed = 823
) # so you can reproduce these results
x = 2*cos(x = 2*pi*(1:500)/50 + 0.6*pi) + rnorm(n = 500,mean = 0,sd = 5)
z1 = cos(
x = 2*pi*(1:500)/50
)
z2 = sin(
x = 2*pi*(1:500)/50
)
M_trig <- data.frame(
x = x,
z1 = z1,
z2 = z2
)
lm_trig <- lm(
formula = x ~ 0 + z1 + z2,
data = M_trig
)
summary(
object = lm_trig
) # zero to exclude the intercept

##
## Call:
## lm(formula = x ~ 0 + z1 + z2, data = M_trig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1836 -2.9692 -0.0714 3.4311 14.0427
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z1 -0.6126 0.2986 -2.052 0.0407 *
## z2 -1.6664 0.2986 -5.581 3.94e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.721 on 498 degrees of freedom
## Multiple R-squared: 0.06629, Adjusted R-squared: 0.06254
## F-statistic: 17.68 on 2 and 498 DF, p-value: 3.828e-08

astsa::tsplot(
x = x,
col = 4
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 30/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

astsa::tsplot(
x = x,
col = astsa::astsa.col(
col = 4,
alpha = 0.7
),
ylab = expression(hat(x))
)
lines(
x = fitted(
object = lm_trig
),
col = 2,
lwd = 2
)

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 31/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

increase the sample size to show convergence of the coefficients

set.seed(
seed = 823
) # so you can reproduce these results
x = 2*cos(x = 2*pi*(1:(1e6))/(1e5) + 0.6*pi) + rnorm(n = (1e6),mean = 0,sd = 5)
z1 = cos(
x = 2*pi*(1:(1e6))/(1e5)
)
z2 = sin(
x = 2*pi*(1:(1e6))/(1e5)
)
M_trig <- data.frame(
x = x,
z1 = z1,
z2 = z2
)
lm_trig <- lm(
formula = x ~ 0 + z1 + z2,
data = M_trig
)
summary(
object = lm_trig
) # zero to exclude the intercept

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 32/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

##
## Call:
## lm(formula = x ~ 0 + z1 + z2, data = M_trig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.8745 -3.3805 -0.0108 3.3583 25.4331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z1 -0.617281 0.007065 -87.38 <2e-16 ***
## z2 -1.908289 0.007065 -270.12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.995 on 999998 degrees of freedom
## Multiple R-squared: 0.07459, Adjusted R-squared: 0.07458
## F-statistic: 4.03e+04 on 2 and 999998 DF, p-value: < 2.2e-16

Write estimated model as one trigonometric function. In general we write a sine/cosine wave using sine, but
since the original function is written as a cosine we will use cosine. (Sine and cosine are shifts of each other.)
Since the author used 2π in every trigonometric function, we take the point of view that the period/frequency is
known. Amplitude and phase shift are unknown. Since there is no intercept, there is no phase shift.

ˆ ˆ
−0.6172813cos(2πt) − 1.9082887sin(2πt) =A cos(2πt + θ )

ˆ ˆ ˆ ˆ
=A cos(2πt)cos(θ ) − A sin(2πt)sin(θ )

ˆ ˆ
A cos(θ ) = − 0.6172813

ˆ ˆ
−A sin(θ ) = − 1.9082887

2 2 2
ˆ 2 ˆ ˆ 2 ˆ ˆ 2 2
A cos (θ ) + A sin (θ ) =A = (−0.6172813) + (−1.9082887) = 4.022602

ˆ
|A | =2.005643

ˆ
cos(θ ) = − 0.3077723

ˆ
sin(θ ) =0.9514600

ˆ −1
θ =cos (−0.3077723) = 1.883647

ˆ ˆ
A cos(2πt + θ ) =2.005643cos(2πt + 1.883647)

3
Acos(2πt + θ) =2cos (2πt + π)
5

3
π ≈1.884956
5

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 33/34


3/1/24, 10:23 AM 2 Time Series Regression and Exploratory Data Analysis 2.2 Exploratory Data Analysis

t0 <- seq(
from = 0,
to = 1,
length = 10000
)
x_correct <- 2*cos(2*pi*t0 + 3*pi/5)
x_estimated <- 2.005643*cos(2*pi*t0 + 1.883647)
library(ggplot2)
M <- data.frame(
t0 = t0,
correct_model = x_correct,
estimated_model = x_estimated
)
M <- tidyr::gather(
data = M,
key = "model",
value = "x",
-t0
)
ggplot(M) +
aes(x = t0,y = x,group = model,color = model) +
geom_line() +
theme_bw()

file:///G:/My Drive/Time Series/NEU/Tài liệu/2-Time-Series-Regression-and-Exploratory-Data-Analysis-2.2-Exploratory-Data.html 34/34

You might also like