You are on page 1of 33

Environmetrics 00, 1–26

DOI: 10.1002/env.XXXX

Testing for Seasonal Means in Time Series


Data

Gang Liua , Qin Shaob , Robert Lundc , and Jonathan Woodyd

Summary: The statistician often needs to test whether or not a time series has a seasonal first moment.

The problem often arises in environmental series, where most time-ordered data display some type of

periodic structure. This paper reviews the problem, proposing new statistics in both the time and frequency

domains. Our new time domain statistic has an ANOVA form that is based on the one-step-ahead

prediction errors of the series. This statistic inherits the classic traits of the F distribution arising in

one-way ANOVA tests, is easy to use, and is asymptotically equivalent to the likelihood ratio test. The

statistic’s asymptotic distribution is quantified when time series parameters are estimated. In the frequency

domain, a statistic modifying Fisher’s classical test for a sinusoidal mean superimposed on independent and
identically distributed Gaussian noise is devised. The performance and comparison of these statistics are

studied via simulation. Implementation of the methods merely requires sample means, autocovariances, and

periodograms of the series. Application to a data set of monthly temperatures from Tuscaloosa, Alabama is

given.

Keywords: Autoregression, One-way ANOVA, Periodicity, Seasonal means, Time series.

1. INTRODUCTION

Seasonal structures (periodicities) arise in many environmental time series (see Lund et
al. 1995 for climatological pursuits, Hipel and McLeod 1994 for hydrological applications,

a
Department of Mathematics and Statistics, The University of Toledo, 2801 W. Bancroft, Toledo, OH, 43606, USA.
This
b is the author
Department manuscript
of Mathematics accepted
and Statistics, for publication
The University of Toledo,and
2801has
W. undergone fullOH,
Bancroft, Toledo, peer review
43606, USA. but has
c
Department of Mathematical Sciences, Clemson University, 101 Calhoun Dr, Clemson, SC 29634, USA.
not
d been through the copyediting, typesetting, pagination and proofreading process, which may
Department of Mathematics and Statistics, Mississippi State University, 75 B.S. Hood Dr., Mississippi State, MS 39762, USA.
lead

to differences between this version and the Version of Record. Please cite this article as doi:
Correspondence to: qin.shao@utoledo.edu
10.1002/env.2383

This paper has been submitted for consideration for publication in Environmetrics

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

and Barnett and Dobso 2010 for health sciences work). Arguably, periodic data is more
commonly encountered in environmental settings than non-periodic data. For example, the
most prominent feature of monthly temperature series from non-tropical stations is a monthly
mean cycle. Although seasonal time series analysis is addressed in many classical time series
texts (Brockwell and Davis 1991, Fuller 1996, and Shumway and Stoffer 2010 to name a
few), the focus is usually on the second moment (autocovariance) structure of the series —
first moment properties are typically assumed estimated and removed; that is, the resulting
series has a zero mean. It is not always obvious whether a series has a seasonal mean. Later,
we give an example in climatology where researchers hope to remove a seasonal mean in
a monthly temperature series by subtracting a temperature record from a geographically
nearby station. Ignoring a seasonal mean can degrade other inferences when it exists.
This paper revisits seasonal mean testing in time series data. We review classic approaches
to the problem and propose some new statistics in both the time and frequency domain. The
majority of existing methods work in the spectral domain. This paper helps fill this gap;
indeed, as we show, time domain approaches seem to perform better.
Our time series {Xt } is seasonally written as

XiT +ν = µν + ϵiT +ν , (1)

where T is the fundamental period of the data, t = iT + ν is the time corresponding to the
νth season of the ith cycle and ν ∈ {1, . . . , T }, and µν is the mean of the series during season
ν. Here, {ϵt } is zero-mean random error. The period T is assumed known. For daily series,
we suggest omitting leap year data; this enforces T = 365 with minimal loss of precision (the
alternative for pseudo-periodic phenomena involves models with an almost periodic structure
as in Hurd (1991) and Lii and Rosenblatt (2006)). The notations {ϵt } and {ϵiT +ν } are used
interchangeably, the latter when emphasis on seasonality is needed.

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

Our model for a stationary {ϵt } involves a pth order autoregression (AR(p)) satisfying the
causal difference equation


p
ϵt = ϕk ϵt−k + Zt , (2)
k=1

where {Zt } is an independent and identically distributed (IID) zero-mean sequence with
variance σZ2 . Here, p is the autoregressive order and ϕ1 , . . . , ϕp are real-valued coefficients —
see Brockwell and Davis (1991) for background on autoregressions. Autoregressive models are
parsimoniously dense in all short-memory stationary structures and have some convenient
properties that will aid our future mathematical analyses.
Our null hypothesis is that the seasonal means are identical: µ1 = µ2 = · · · = µT . The
alternative is that two or more of the seasonal means do not coincide. When {ϵt } is
IID, perhaps the most classic way of assessing the null hypothesis from a data sequence
X1 , . . . , XN involves the ANOVA F -type statistic

∑T ¯ 2
ν=1 (X̄ν − X̄ ) /(T − 1)
d
F = ∑T ∑d−1 . (3)
i=0 (XiT +ν − X̄ν ) /(N −
2 T)
ν=1

Here, d = N/T is the number of cycles of observed data and is assumed here to be a whole

number to avoid trite work. The sample means are X̄ν = d−1 d−1 i=0 XiT +ν for season ν and
¯ = N −1 ∑d−1 ∑T X
X̄ i=0 ν=1 iT +ν for the grand mean. The cycle i = 0 contains the data points

X1 , . . . , XT . Of course, F is a time-domain statistic.


It is well known that if {ϵt } is IID and Gaussian, then the test statistic in (3) has
an F (df1 , df2 )-type distribution with df1 = T − 1 numerator degrees of freedom and df2 =
N − T denominator degrees of freedom. While this distributional claim is reasonably robust
against moderate departures from normality, it degrades quickly when {ϵt } is correlated (see

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

Glass et al. 1972; this will become apparent later).


Classical approaches for seasonal mean detection have largely worked in the spectral
domain: see Fisher (1929), Grenander and Rosenblatt (1957), Bloomfield (1976), Chiu
(1989), Novack (1994), and Artis et al. (2004) and the references within. Almost all of
this literature assumes that {ϵt } is IID. Specifically, for {ϵt } IID, one computes the discrete
Fourier transform at frequency ωj = 2πj/N via


N
−1/2
Dj = N Xt e−itωj , j = 1, 2, . . . , q,
t=1

where q = ⌊(N − 1)/2⌋ and the periodogram Ij = |Dj |2 , j = 1, 2, . . . , q. Fisher’s test (Fisher,
1929) examines the statistic

max1≤j≤q Ij
S= ∑ , (4)
q −1 qj=1 Ij

rejecting the null of equal seasonal means when S is too large to be explained by chance
variation. Fisher’s test was designed to be powerful against an alternative where the seasonal
means have a simple sinusoidal form over each cycle. The intuition is that a large value of
Ij implies that a sinusoid with frequency ωj is present in the series. Because every periodic
mean cycle with period T has some sinusoidal component (in particular, those in its Fourier
expansion), S can also be applied in settings with a general periodic mean. This said, it is
not expected to be as powerful in such cases.
The null hypothesis distribution of S in the Gaussian case is given in Brockwell and Davis
(1991):

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics


q ( )
q
P (S > x) = 1 − (−1) j
(1 − jx/q)q−1
+ , (5)
j=0
j

where x+ = max(x, 0).


Moving to cases where {ϵt } is correlated, previous time domain research has attempted to
adjust the numerator and/or denominator in the F statistic in (3). Sutradhar et al. (1995)
and Shao and Ni (2004) narrate these efforts. Often, computations for such approaches
become unwieldy. We are not aware of literature attempting to modify S for autocorrelation,
but will show how to do this in the next section. Also, a Gaussian likelihood ratio test can
be devised — see the next section.
In the time domain, a simple prediction residual idea is used here to devise a new statistic
mimicking (3). The idea is motivated by Lund et al. (2015), where a test for a common
mean among m independent time series is devised, but can also be used in this scenario.
This statistic retains the classic one-way ANOVA F -type distribution (exactly) for Gaussian
data when the time series parameters are known.
In practice, the time series parameters need to be estimated. For autoregressive {ϵt },
the asymptotic distribution of our time domain statistic is shown to be chi-squared with
T − 1 degrees of freedom divided by T − 1: χ2T −1 /(T − 1). While formal proofs for general
autoregressive moving-average (ARMA) errors {ϵt } (that is, models with a moving-average
component) are involved and are not pursued here, the results are expected to apply to this
case. The prediction residual statistic is easy to implement and outperforms one-way ANOVA
tests that ignore correlation for all sample sizes. It is also shown to be slightly better than
the Gaussian likelihood ratio test for small sample sizes in our future simulations. While all
calculations are carried out in R, an open access environment for statistical computing and
graphics developed by the R Core Team (2014), computation of our statistics only requires
basic functions or subroutines that are included in most statistical software packages.

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

The rest of our paper proceeds as follows. Section 2 reviews past work on the problem
and produces several new test statistics. It also quantifies the asymptotic distribution
of the prediction residual ANOVA test and states the asymptotic equivalence of the
Gaussian likelihood ratio and the prediction residual ANOVA tests. Section 3 compares
the performance of all tests via simulation. Section 4 applies the methods to a monthly
temperature series from Tuscaloosa, Alabama from 1930-1959. In this setting, climatologists
hope to remove seasonal mean cycles from temperature records by subtracting the record of
some neighboring station. As we show, such a subtraction may not completely eliminate the
seasonal mean. Section 5 concludes with some comments and remarks. An Appendix proves
our main theoretical results.

2. SEASONAL MEAN TEST STATISTICS FOR TIME SERIES DATA

For AR(p) {ϵt }, the one-step-ahead linear prediction X̂t = P (Xt |1, X1 , . . . , Xt−1 ) has form


p
X̂t = µ + ϕk (Xt−k − µ), t > p, (6)
k=1

under the null hypothesis, where µ ≡ E[Xt ] denotes the common mean. The one-step-ahead
prediction errors, denoted by {Rt }N
t=p+1 , are defined pointwise by Rt = Xt − X̂t . To avoid

busy work that accounts for edge-effects (the fact that (6) does not hold for t = 1, . . . , p),
we start our residual sequence at t = p + 1. Also observe that the mean squared prediction
error vt := E[Rt2 ] = σZ2 does not depend on t when t > p.

2.1. Time Domain Tests

Define iν , for ν ∈ {1, 2, . . . , T }, as the smallest integer such that iν T + ν > p. The seasonal
sample and overall means of {Rt }N
t=p+1 are

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

1 ∑ 1 ∑∑
d−1 T d−1
R̄ν = RiT +ν , ¯=
R̄ RiT +ν . (7)
d − iν i=i N − p ν=1 i=i
ν ν

The key technical innovation of this paper is simple: if the time series parameters ϕ1 , . . . , ϕp
are known and {Xt } is Gaussian, then an ANOVA F test statistic based on {Rt }N
t=p+1 ,

∑ ¯ )2 /(T − 1)
d Tν=1 (R̄ν − R̄
FR = ∑T ∑d−1 , (8)
ν=1 i=iν (RiT +ν − R̄ ν )2 /(N − p − T )

has exactly an F -type distribution with T − 1 numerator degrees of freedom and N − p − T


denominator degrees of freedom. This is because Rt = Zt for t > p by (6) and {Zt } is IID
Gaussian. As N → ∞ (equivalently, d → ∞), the denominator in (8) converges to unity
under the null hypothesis (use infinite divisibility of the χ2 distribution and the law of large
numbers). The numerator is a χ2 variate with T − 1 degrees of freedom divided by T − 1.
Hence, we have the following result.

Remark 1 The asymptotic distribution of FR is χ2T −1 /(T − 1) as N → ∞.

While our proofs focus on autoregressive errors, the idea also applies to any non-degenerate
stationary Gaussian series and even to some non-stationary series like random walks. This
is because the one-step-ahead prediction residuals from a Gaussian series always have a zero
mean and are independent. Hence, if these prediction residuals have a constant variance,
one can construct an exact F statistic. This simple observation alleviates one of adjusting
quadratic forms for autocorrelation, or altering critical values in F tables. With FR , the
one-step-ahead prediction residuals “do the work”.
There is a practical issue in the above: the autoregressive parameters ϕ1 , . . . , ϕp are typically

unknown. Hence, we simply plug in N -consistent estimators of these quantities and show
7

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

that the resulting statistic works asymptotically as N → ∞. One can select the autoregressive
order p via any of the standard methods in Brockwell and Davis (1991, Chapter 8).
Elaborating on the above, ϕ = (ϕ1 , . . . , ϕp )′ is replaced by the Yule-Walker estimator ϕ̂
satisfying ϕ̂ = Γ̂−1
p γ̂p , where Γ̂p = [γ̂(i − j)]i,j=1 (this matrix is always invertible if the data
p

are non-constant — see the result of Problem 7.11 in Brockwell and Davis, 1991) and

γ̂p = (γ̂(1), γ̂(2), . . . , γ̂(p)) . Here, a biased-adjusted estimate of the covariances is employed
under the null:

1 ∑( )( )
N −h
γ̂(h) = ¯ X − X̄
Xt − X̄ ¯ , (9)
t+h
N − h t=1

¯ = (N − p)−1 ∑N
where X̄ t=p+1 Xt . While the bias adjusted estimator of γ(h) (using N − h

instead of N in the denominator of the above sample average) may not result in a true non-
negative definite autocovariance estimator, this is usually of little consequence, and correcting
for this bias affords one better small-sample performance. Bias issues in Yule-Walker AR(p)
parameter estimation methods are discussed in Tjøstheim and Paulsen (1983).
ê t }N
The estimated one-step-ahead predictions, denoted by {X t=p+1 , are hence


p ( )
ê t = X̄
X ¯+ ¯ ,
ϕ̂k Xt−k − X̄ t > p. (10)
k=1

¯ to estimate µ under the null hypothesis. To avoid edge-effects, only the


This uses X̄
observations at times p + 1, . . . , N are used in (10). For example, seasonal sample means
ē ν = (d − iν )−1 ∑d−1 X
are X eiT +ν . The one-step-ahead prediction errors {R
e t }N
i=iν t=p+1 are defined

et = Xt − X ê t . The sample averages R ¯


ēν and R
ē are
pointwise by R

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

1 ∑e ∑
d−1 N
ēν = ¯ē 1 et .
R RiT +ν , R = R
d − iν i=i N − p t=p+1
ν

The proposed test statistic is hence

∑T ē ¯ē 2
e d ν=1 (Rν − R) /(T − 1)
FR = ∑ ∑d−1 e . (11)
T
(R iT +ν − ēν )2 /(N − p − T )
R
ν=1 i=iν

Our primary result is the following and is proven in the Appendix.

Theorem 1 Suppose that the time series {Xt }N


t=1 satisfies (1) and that {ϵt } is a causal

autoregression satisfying (2). Then under the null hypothesis of equal seasonal means,
FeR → χ2T −1 /(T − 1) as N → ∞.
D

A Gaussian likelihood ratio test statistic can also be constructed from the one-step-
ahead prediction residuals. The Innovations form of the Gaussian likelihood L of the data
Xp+1 , . . . , XN expresses the likelihood in terms of the one-step-ahead predictions

( )−1/2 { }

N
1 ∑ (Xt − X̂t )2
N
L = (2π)−(N −p)/2 vt exp − ,
t=p+1
2 t=p+1 vt

where vt ≡ σZ2 = E[(Xt − X̂t )2 ] is the unconditional mean squared prediction error, which
is constant for t > p. Hence, the likelihood ratio of the null hypothesis model where
µ = µ1 = · · · = µT (call this L0 ) to the full model with T seasonal means (call this LA )
is
9

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

(∏ { ∑ )−1/2 }
(Xt −X̂t0 )2
(2π)−(N −p)/2 exp − 21 N
N
t=p+1 σZ2
t=p+1 2
σZ
L0 /LA = (∏ )−1/2 { ∑ }. (12)
(Xt −X̂tA )2
(2π)−(N −p)/2 N 2
t=p+1 σZ exp − 2 t=p+1
1 N
σ2 Z

Equation (12) uses X̂t0 to denote the best linear prediction of Xt under the null hypothesis:


p
X̂t0 µ̂ML 0,k (Xt−k − µ̂0 ),
ϕ̂ML ML
= 0 + t > p,
k=1

where µ̂ML
0 and ϕ̂ML A
0,k are the maximum likelihood estimates (MLE). Likewise, X̂t denotes

the one-step-ahead prediction without restriction:


p

k (XiT +ν−k − µ̂ν−k ),


A ML
X̂iT +ν = µ̂ν + ϕ̂ML ML
iT + ν > p,
k=1

where µ̂ML
ν and ϕ̂ML
k are the maximum likelihood estimates (MLE), and µ̂ML
ν is interpreted
periodically with period T .
Taking logarithms in (12) and multiplying by −2σZ2 produces a statistic that is the
appealing sum of squares differences


N ∑
T ∑
d−1
lrat = (Xt − X̂t0 )2 − (XiT +ν − X̂iT
A 2
+ν ) .
t=p+1 ν=1 i=iν

¯ is not the exact likelihood estimate of µ under the null hypothesis (because of
While X̄
the correlated errors), both it and the maximum likelihood estimator of µ converge almost
surely to µ. Likewise, X̄ν is close to the likelihood estimator of µν under the alternative
hypothesis. Because of this, we use the prediction residuals

10

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

0 ∑
p ( )
ê t = X̄
X ¯+ ¯ ,
ϕ̂k Xt−k − X̄ t > p,
k=1

A ∑
p

X ϕ̂k (XiT +ν−k − X̄ν−k ),
iT +ν = X̄ν + iT + ν > p.
k=1

This produces a likelihood ratio of

N (
∑ )
0 2 ∑ d−1 (
T ∑ A
)2
e
lrat = ê
Xt − X − ê
XiT +ν − X .
t iT +ν
t=p+1 ν=1 i=iν

The Appendix proves the following result.

Theorem 2 Suppose that the time series {Xt }N


t=1 satisfies (1) and that {ϵt } is a causal

autoregession satisfying (2). Then e


lrat /(T − 1) − FeR −→ 0 as N → ∞.
P

2.2. Frequency Domain Tests

Should {ϵt } have known spectral density f (λ) at frequency λ ∈ (−π, π], our suggestion is
to apply the S test in (4) to

Ij
, j = 1, . . . , q.
2πf (ωj )

This is because {Ij }qj=1 is asymptotically independent and exponentially distributed


with E[Ij ] → 2πf (ω) for a sequence of Fourier frequencies ωj → ω (see Theorem 10.3.2
in Brockwell and Davis, 1991, for a precise statement of this convergence). Hence,
{Ij /(2πf (ωj ))} is asymptotically IID with a unit exponential density and Fisher’s (1929)
11

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

original derivation of (5) from unit exponential variates applies (Section 10.2 in Brockwell
and Davis, 1991, also gives a proof). As a caveat, the stated independence only holds
asymptotically, even when f is known and the series is Gaussian. Brockwell and Davis
1991, Chapter 10, Theorem 10.3.2, and Shumway and Stoffer 2011, Theorem C.4, provide
technicalities: Cov(Ij , Ik ) = O(n−1 ) when {Zt } is IID with a finite second moment.
In practice, one needs to replace f with some estimator. For a causal AR(p) {ϵt }, we
propose the spectral density estimator

−2
ˆ σ̂Z2 −iω

−ipω
f (ω) = 1 − ϕ̂ e − · · · − ϕ̂ e ,

1 p

where ϕ̂1 , . . . , ϕ̂p are the bias corrected Yule-Walker estimators above and σ̂Z2 = γ̂(0) −
∑p
ℓ=1 ϕ̂ℓ γ̂(ℓ).

Summarizing, we suggest the frequency domain statistic

max1≤j≤q Mfj
Se = ∑ , (13)
fj
q −1 qj=1 M

where

2

Ij 1 − ϕ̂1 e−iωj − · · · − ϕ̂p e−ipωj
fj =
M .
σ̂Z2

The same percentiles in (5) apply asymptotically — the value of σ̂Z2 does not change the
statistic. Lin and Liu (2009) prove this result in their Remark 2.4 when the errors are a
causal autoregression and {Zt } is IID with a finite (4 + δ)th moment for some δ > 0. It is
worthwhile noting that the spectral density of any causal autoregression is bounded away

12

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

from zero, implying that fˆ(ω)/f (ω) converges uniformly to unity (almost surely Lebesgue).

3. SIMULATIONS

This section studies the performance of the test statistics in the last section via simulation.
Throughout, we take α = 0.05 and study empirical significance levels and empirical powers
under various Gaussian autoregressive structures.
Each empirical significance level or empirical power reported below is aggregated from 3000
independent time series paths generated via (1). In all cases, the period is T = 4. In empirical
significance level simulations, the four null hypothesis means are µ1 = µ2 = µ3 = µ4 = 1.
Various AR orders and autoregressive parameters are considered to reflect a variety of
covariance structures. The white noise variance is always taken as σZ2 = 1. The d-values
10, 20, 50, and 200 are studied to see how the asymptotics “kick in”.
Table 1 reports empirical significance levels for AR(1) models with a variety of
autoregressive coefficients ϕ1 . When ϕ1 ∈ (0, 1), the errors are positively correlated; when
ϕ1 ∈ (−1, 0), the errors are negatively correlated at odd lags. The column α̂raw reports the
empirical significance levels of the F statistic in (3) that ignores all correlations in the series.
The column α̂pe refers to the empirical significance levels of our F statistic in (8) that uses
the estimated one-step-ahead prediction errors; α̂lrt is the empirical significance level of the
likelihood ratio test. With e
lrat , accept/reject decisions are made according to the classical
χ2T −1 asymptotic distribution. The columns α̂S and α̂Se refer to the empirical significance
levels of our spectral tests, α̂S ignoring autocorrelation and α̂Se accounting for it via (13).
AR model order selection was not considered.
The empirical significance levels in Table 1 exhibit significant structure. First, as expected,
all methods improve as d increases. Second, the empirical significance levels for α̂raw and α̂S ,
statistics that ignore autocorrelation, can be far from 0.05, even for large d. The inference is
13

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

that one should not ignore autocorrelation in seasonal mean equality tests when it is present.
Third, the empirical significance levels for α̂Se are further from 0.05 than those for α̂pe or α̂lrt .
The winners are clearly the prediction residual based F test and the likelihood ratio test.
Scrutinizing these two tests further, the empirical significance levels of the likelihood ratio
test for small d appear a little too high. Sutradhar and Bartlett (1993) also noted this aspect
in linear regression models with correlated errors for small sample sizes. The prediction
residual based F test performs well for all d and positively and negatively correlated model
errors. As Theorem 2 implies, differences between prediction error and likelihood ratio tests
are virtually indistinguishable for large d.

[Table 1 about here.]

Table 2 reports empirical rejection powers for the AR(1) series when the unequal seasonal
means are µ1 = 0, µ2 = −0.5, µ3 = 0, and µ4 = 0.5. Powers are denoted by β, with subscripts
as in Table 1 used to distinguish the methods. In the power calculations, we replace the overall
¯ by the seasonal means X̄ in (9) for the Yule-Walker estimates of ϕ = (ϕ , . . . , ϕ )′ ,
mean X̄ ν 1 p

1 ∑
N −h
γ̂(h) = (Xt − X̄ν(t) )(Xt+h − X̄ν(t+h) ),
N − h t=1

where X̄ν(t) is the seasonal sample mean of the data for the season in use at time t.
As the F and S tests unaltered for correlation had unacceptable empirical significance
levels, we do not comment on the power of these tests. Of the other three tests, the powers
for the spectral test are the worst. This is in spite of the fact that the alternative means
form a perfect sine wave: µt = sin(2π(1 − t)/4)/2. The powers for the prediction error F
test are slightly worse than those for the likelihood ratio test, but this discrepancy is slight
and expected since the likelihood ratio test rejects the null hypothesis slightly more than it
should. As d gets larger, all tests perform better. When d = 200, the prediction error and
14

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

likelihood ratio tests make perfect conclusions in all 3000 runs for all ϕ1 considered.

[Table 2 about here.]

Table 3 reports empirical significance levels of aggregated type I errors for AR(2) models.
Here, a variety of ϕ1 and ϕ2 coefficients were selected to vary the autocorrelation structure.
The geometric decay rate of the autocovariance function to zero (with increasing lag) is
shown — this is the reciprocal of the absolute value of the smallest root of the autoregressive
polynomial (see Brockwell and Davis, 1991). Tests that ignore autocorrelation again do not
perform well. The other three tests show a pattern similar to the empirical significance
levels in Table 1: the spectral test is the worst, the prediction error and likelihood ratio
tests are roughly comparable, and the likelihood ratio test is slightly overaggressive in
rejecting the null hypothesis for small d. The likelihood ratio test’s empirical significance
levels move further from 0.05 as the decay rate of the autocovariances moves closer to
unity. Table 4 shows the empirical powers when the alternative hypothesis means are
µ1 = 0.8, µ2 = 1.2, µ3 = −0.1, and µ4 = 0.5 (µt is not a single sine wave here). These powers
behave similarly to those reported in Table 2.

[Table 3 about here.]

[Table 4 about here.]

Finally, Table 5 shows the empirical significance levels for AR(4) models with various
autoregressive parameter settings. The interpretation of these results is akin to those in
Tables 1 and 3. We omit a power simulation for this case.

[Table 5 about here.]

Our conclusions are that one should not ignore autocorrelation in ANOVA tests when it
is present, that spectral tests seem inferior to time domain tests, and that the prediction
residual test appears very effective, even for modest sample sizes.
15

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

4. AN APPLICATION

This section applies our results to a monthly averaged temperature series (T = 12). The
top panel in Figure 1 shows monthly average high temperatures observed at Aberdeen,
Mississippi over the d = 30-year period January 1930 — December 1959. The second panel
in this graph depicts the monthly average high temperatures at nearby Tuscaloosa, Alabama
over the same time period. The third panel of this graphic shows monthly boxplots of the
differences of the first two series, and the last panel displays the differences of the series
themselves. Neither series has any missing data.

[Figure 1 about here.]

While the top two series in Figure 1 are clearly seasonal (with summer temperatures
warmer than winter temperatures), the question of interest here is if the differenced series
in the bottom panel is indeed seasonal. The bottom two panels in Figure 1 do not give
compelling evidence either way. Hence, a closer examination of the statistical properties of
this series is warranted.
Before doing this, we motivate our interest in this question. In climatology, a “reference
series” is often used to identify changepoint times in a “target series”. Changepoints are times
in which station instrumentation, location, observer, or time of observation (among other
things) change. Changepoints often induce series mean shifts and are important features
that should be taken into consideration when making climate change assessments (Lund
et al. 2001). In United States stations, changepoints are estimated to occur once every 16
years on average (Mitchell, 1953). In changepoint pursuits, climatologists often subtract a
reference station record from a target station record. The hope is that the target minus
reference difference will remove any long term trends and seasonal cycles. The reasoning
is that stations in close proximity should experience similar weather and hence long-term
trends and seasonal cycles. The target minus reference subtraction should graphically help
16

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

illuminate any changepoints. Menne and Williams Jr. (2005, 2009) discuss target minus
reference temperature comparisons and how they are used to detect changepoints.
The question explored here is whether the target minus reference comparison has truly
removed the seasonal mean. Let {Xt } denote the Aberdeen series and {Yt } denote the
Tuscaloosa series. We view Aberdeen as the target series and Tuscaloosa as the reference
series and define {∆t } point-wise by ∆t = Xt − Yt . If the seasonal means have been removed,
then {∆t } should have a constant mean (which need not be zero). That the seasonal mean has
been removed in such comparisons is a fundamental assumption made by many changepoint
analysts. If {∆t } in fact has a periodic mean, a changepoint detection scheme might flag many
erroneous changepoints in an attempt to follow the seasonal cycle, a situation to avoid. This
could be especially problematic should daily temperatures be considered.

[Table 6 about here.]

When an AR(p) model was fitted to the Aberdeen minus the Tuscaloosa series, the order
p = 1 was selected as optimal. Estimated coefficients are ϕ̂1 = 0.5736 (one standard error
is 0.04317) and σ̂Z2 = 1.021. Obviously, significant non-zero autocorrelation exists in {∆t }.
Statistics for the test of equal seasonal means are FeR = 2.501 (p−value is 0.005), Se = 8.828
(p−value is 0.022), and e
lrat = 27.977 (p−value is 0.003). All tests reject equality of seasonal
means at level 95%. Table 6 shows the estimated seasonal means of {∆t }. It appears
that Aberdeen is colder during winter months and warmer during summer months. This
is physically explainable: Aberdeen’s sandier soils might induce more radiational cooling.
Before leaving this application, the normality and constant variance assumptions made in
the above analyses were scrutinized. Monthly temperature series tend to be Gaussian — they
average daily temperatures over all days in a month, which induces approximate normality.
Formal tests for normality are passed. Bartlett’s equal variance test over the months has a
p-value of 0.790. Hence, a seasonally constant variance appears reasonable.
17

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

5. SUMMARY AND COMMENTS

In summary, this paper presents some new time and frequency domain statistics for testing
for a seasonal mean with stationary time series errors. The time domain ANOVA and
likelihood ratio statistics appear more powerful than their frequency domain counterparts.
The new ANOVA-type statistic constructed from prediction residuals allows the researcher
to retain the classical F-distribution ANOVA setup and its percentiles without altering any
of the quadratic forms in it.
Any robustness issues for independent ANOVA analyses also apply here (see Glass et
al. 1972; Schminder et al. 2010). Briefly, normality is not overly critical, the issue becoming
more important with heavy-tailed errors. As shown here, correlation aspects are critical —
do not expect ANOVA methods to work well for correlated data when correlation is ignored.
The equal variance assumed with the stationary {ϵt } merits further exploration. ANOVA
methods are not especially robust to heteroscedastic errors. Eschewing ANOVA F methods,
it would be possible to construct an asymptotic χ2 test for the equal mean hypothesis
for a general error process whenever the central limit theorem holds. Elaborating, let
Xi = (XiT +1 , . . . , XiT +T )′ contain the ith cycle of data. With µ = (µ1 , . . . , µT )′ , the equal
means null hypothesis is simply H0 : Aµ = 0(T −1)×1 , where A is the T − 1 × T dimensional
matrix

 
1 −1 0 ... 0 0 0
 
 
 0 1 −1 . . . 0 0 0 
 
 .. .. .. . . .. .. .. 
A=
 . . . . . . . .

 
 
 0 0 0 . . . 1 −1 0 
 
0 0 0 ... 0 1 −1
∑d−1
Letting µ̂ν = d−1 i=0 XiT +ν and µ̂ = (µ̂1 , . . . , µ̂T )′ , the statistic

18

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

b ′ ]−1 Aµ̂
dµ̂′ A′ [AΛA

would have an asymptotic χ2 distribution with T − 1 degrees of freedom should one be able to
establish the central limit theorem µ̂ ∼ AN (µ, Λ/d) as d → ∞. Such asymptotic normality
holds for T -variate stationary {Xn } under minor regularity conditions (see Brockwell and
Davis (1991), Proposition 7.1.2 for the univariate case). Such methods would handle the case
of a periodic autoregression and should also apply to some locally stationary error settings
(Dahlhaus, 1997); however, there is work to do in identifying and estimating Λ.
In spectral settings, it may be desirable to limit the frequencies considered in the S statistic,
say akin to that in equation (10.2.13) in Brockwell and Davis (1991) after scaling for the
autocorrelation. We will not explore this issue here.

APPENDIX

We first show Theorem 1 via several lemmas. All work assumes a common theoretical mean
of µ = µ1 = · · · = µT ; that is, the null hypothesis holds. To begin, we establish relationships
ēν and R̄ν and between R ¯ē ¯ . Define
between R and R̄

( )

p
η̂d = 1− ϕ̂k ¯ ).
(µ − X̄
k=1
√ √
As ϕ̂k is the Yule-Walker estimator, it is N -consistent (and hence also d-consistent):
( ) ∑
ϕk − ϕ̂k = OP d−1/2 for each season k. It follows that 1 − pk=1 ϕ̂k = OP (1). By the central
¯ − µ = O (d−1/2 ). Combining these results shows
limit theorem for stationary time series X̄ P

that η̂d = OP (d−1/2 ).


e t }N
We can now express the prediction residuals {R t=p+1 as

19

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

[ ]

p
eiT +ν = XiT +ν
R ¯+
− X̄ ¯
ϕ̂k (XiT +ν−k − X̄ )
k=1
∑ p
= RiT +ν + η̂d + (ϕk − ϕ̂k )(XiT +ν−k − µ). (14)
k=1

ēν }T are
Hence, the seasonal means {R ν=1

1 ∑ ∑ 1 ∑
d−1 p d−1
ēν =
R RiT +ν + η̂d + (ϕk − ϕ̂k ) (XiT +ν−k − µ)
d − iν i=i k=1
d − iν i=i
ν ν

∑(p )
= R̄ν + η̂d + ϕk − ϕ̂k (X̄ν−k − µ). (15)
k=1

Since X̄ν−k − µ = OP (d−1/2 ) by the central limit theorem for stationary time series,

ēν = R̄ν + η̂d + OP (d−1 ).


R (16)

A similar argument gives

¯ē ¯ + η̂ + O (d−1 ).
R = R̄ d P (17)

Lemma 1 Define

∑T ¯ē 2
ē − R) ∑T ¯ )2
ed = d ν=1 (Rν d ν=1 (R̄ν− R̄
U , Ud = .
T −1 T −1
ed − Ud = oP (1).
Then under the assumptions of Theorem 1, U

¯ = O (d−1/2 ),
Proof. From (16), (17), and the fact that R̄ν − R̄ P

∑ ¯ē 2 ∑ [ ] ∑
T T T
ēν − R)
(R = ¯ ) + O (d−1 ) 2 =
(R̄ν − R̄ ¯ )2 + O (d−3/2 ).
(R̄ν − R̄
p p
ν=1 ν=1 ν=1

20

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

∑T ¯ē 2
ē − R) ∑ ¯ )2 = o (d−1/2 ) and the proof is complete.
Therefore, d ν=1 (Rν − d Tν=1 (R̄ν − R̄ P

Lemma 2 Define

∑T ∑d−1 ( e ē
)2
∑T ∑d−1 ( )2
ν=1 i=iν R iT +ν − R ν RiT +ν − R̄ν
Ved = , Vd = ν=1 i=iν
.
N −p−T N −p−T

Then under the assumptions of Theorem 1, Vd − Ved = oP (1).

Proof. First observe that

[ ]

p
eiT +ν = XiT +ν − X̄
R ¯+ ¯)
ϕ̂k (XiT +ν−k − X̄
k=1
∑ p
= ¯ ),
ϕ̂k (XiT +ν−k − X̄
k=0


∑d−1
where ϕ̂0 = 1 and X̄ν−k = (d − iν )−1 j=iν XjT +ν−k with iν being the the smallest integer i

such that iT + ν > p. Note that X̄ν−k is slightly different from the seasonal sample mean
∑ ē
X̄ν−k = d−1 d−1
j=0 XjT +ν−k due to “edge effects”. Plugging this into the definition of Rν gives

ēν = ∑p ϕ̂k (X̄ν−k − X̄


R ¯ ). Hence,
k=0


p

RiT +ν − R̄ν = ϕk (XiT +ν−k − X̄ν−k ) (18)
k=0

and similar reasoning provides


p
eiT +ν − R
R ēν = ∗
ϕ̂k (XiT +ν−k − X̄ν−k ), (19)
k=0

where ϕ0 = 1. Combining (18) and (19) gives

p ( )

eiT +ν − R
R ēν = RiT +ν − R̄ν − ∗
ϕk − ϕ̂k (XiT +ν−k − X̄ν−k ). (20)
k=1

21

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

Now plug (20) into the definitions of Vd and Ved to get

( ) ∑ d−1 (
T ∑ )2 ∑
T ∑
d−1
( )2
(N − p − T ) Ved − Vd = eiT +ν − R
R ēν − RiT +ν − R̄ν
ν=1 i=iν ν=1 i=iν
{ p }2

T ∑
d−1 ∑( )

= ϕk − ϕ̂k (XiT +ν−k − X̄ν−k )
ν=1 i=iν k=1
{ p }

T ∑
d−1
{ } ∑ ( )

−2 RiT +ν − R̄ν ϕk − ϕ̂k (XiT +ν−k − X̄ν−k )
ν=1 i=iν k=1
{ p }2

T ∑
d−1 ∑( )

= ϕk − ϕ̂k (XiT +ν−k − X̄ν−k )
ν=1 i=iν k=1
{ p }{ p (
}

T ∑
d−1 ∑ ∑ )
∗ ∗
−2 ϕk (XiT +ν−k − X̄ν−k ) ϕk − ϕ̂k (XiT +ν−k − X̄ν−k ) . (21)
ν=1 i=iν k=0 k=1
∑d−1
Since ϕk − ϕ̂k = Op (d−1/2 ) for each k ∈ {1, 2, . . . , p} and i=iν (XiT +ν−k

− X̄ν−k )2 = Op (d)
for each ν and k in {1, . . . , T }, the first double sum in (21) is Op (1) by the Cauchy Schwarz
inequality. Similar reasoning shows that the lowermost double sum in (21) is also Op (1).
Hence, upon division by N − p − T , Vd − Ved = oP (1), which completes our proof.
Proof of Theorem 1. Algebraic manipulations provide


U ed U
|FR − FeR | ≤ − 1 + 1 − .
d

Ved Vd

By Lemmas 1 and 2, Vd − Ved = oP (1) and Ud − U


ed = oP (1). Arguing as in Lund et
ed /Ved − 1 = oP (1). Thus, we obtain
al. (2015), we can show that Ud /Vd − 1 = oP (1) and U
FeR − FR = oP (1) as claimed.

Proof of Theorem 2. The proof of Theorem 1 shows that FR − FeR = oP (1). Similar
arguments will show that e
lrat − lrat = oP (1) and e
lrat − e∗
lrat = oP (1), where e∗
lrat is defined as
follows:

22

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics

N (
∑ )2 ∑ d−1 (
T ∑ )2
e∗
lrat = e0
Xt − X − XiT +ν eA
−X
t iT +ν
t=p+1 ν=1 i=iν

with

p ( )
e 0 = X̄
X ¯+ ¯ ,
ϕk Xt−k − X̄ t > p,
t
k=1


p
eiT
X A
+ν = X̄ν + ϕk (XiT +ν−k − X̄ν−k ), iT + ν > p.
k=1

Note that e∗
lrat is defined in a same manner as e
lrat except that ϕ̂k is replaced by ϕk . Hence, it
is sufficient to prove that

e∗
lrat
− FR = oP (1)
T −1

as d → ∞. This reduces work to cases where the autoregressive coefficient ϕ1 , . . . , ϕp are


known.
Since the denominator of FR converges to unity almost surely and Rt = Zt when AR
coefficients and the mean are known, we need only show that

T (
∑ )2
e∗
lrat −d Z̄ν − Z̄¯ = oP (1).
ν=1

To do this, expand with (1) and (2) to get

[ { }]2

N ∑
p ( )
e∗
lrat = Xt − ¯+
X̄ ϕk ¯
Xt−k − X̄
t=p+1 k=1
[ { }]2

T ∑
d−1 ∑
p
( )
− XiT +ν − X̄ν + ϕk XiT +ν−1 − X̄ν−k
ν=1 i=iν k=1

23

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

and simplify with (2) to get


N ∑
T ∑
d−1
e∗
lrat = (Zt − η1,d )2 − (ZiT +ν − η2,d )2 , (22)
t=p+1 ν=1 i=iν
( )( )

p
¯ ∑
p ( )
where η1,d = 1− ϕk X̄ − µ and η2,d = X̄ν − µν − ϕk X̄ν−k − µν−k .
k=1 k=0
Taking averages in (1) and (2) provides


p ( )
¯ =µ+
X̄ ϕk X̄ − µ + Z̄¯ + oP (d−1 ),
¯
k=1

the OP (d−1 ) term arising due to edge effects (averages of Z follow the notational conventions
in (7)). This gives

η1,d = Z̄¯ + oP (d−1 ).

Similar reasoning provides

η2,d = Z̄ν + oP (d−1 ).

Plugging these two results into (22) produces

N (
∑ )2 ∑
T ∑
d−1
( )2
e∗
lrat = Zt − Z̄¯ − ZiT +ν − Z̄ν + oP (1).
t=p+1 ν=1 i=iν

The proof is completed by applying the sum of squares identity

24

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Testing Seasonal Means in Time Series Data Environmetrics


T ∑
d−1 ∑ d−1 (
T ∑ )2
(ZiT +ν − Z̄ν )
2
= ZiT +ν − Z̄¯ + Z̄¯ − Z̄ν
ν=1 i=iν ν=1 i=iν
N (
∑ )2 ∑
T ( )2
= Zt − Z̄¯ − (d − iν ) Z̄ν − Z̄¯
t=p+1 ν=1
N (
∑ )2 T (
∑ )2
= Zt − Z̄¯ −d Z̄ν − Z̄¯ + oP (1),
t=p+1 ν=1

where the last line follows from Z̄ν = oP (1) and Z̄¯ = oP (1) by the law of large numbers.

ACKNOWLEDGEMENTS

Robert Lund acknowledges partial support from NSF Grant DMS 1407480. The climate
application was posed at SAMSI’s 2014 climate homogeneity summit in Boulder, Colorado.

REFERENCES
Artis, M., Hoffman, M., Toro, J., and Nachane, D. (2004) The detection of hidden periodicities: A comparison
of alternative methods, Technical Report No EC02004/10, European University Institute, San Domenico,
Italy.
Barnett, A. G. and Dobson, A. J.(2010) Analysing Seasonal Health Data, Springer-Verlag, Berlin.
Bloomfield, P. (1976) Fourier Analysis of Time Series: An Introduction, John Wiley, New York City.
Brockwell, P. and Davis R. A. (1991) Time series: Theory and Methods, Section Edition, Springer, New York
City.
Chiu, S. (1989) Detecting periodic components in a white Gaussian time series, Journal of the Royal
Statistical Society, Series B, 52, 249-259.
Dahlhaus, R. (1997) Fitting time series models to nonstationary processes, Annals of Statistics, 25, 1-37.
Fisher, R. A. (1929) Test of significance in harmonic analysis, Proceedings of the Royal Society of London,
Series A, 125, 54-59.
Fuller, W. A. (1996) Introduction to Statistical Time Series, Second Edition, Wiley, New York City.
Glass, G. V., Pechkam, P. D. and Sanders, J. R. (1972) Consequences of failure to meet assumptions
underlying the fixed effects of analysis of variance and covariance, Review of Educational Research, 42,
237-288.
Grenander, U. and Rosenblatt, M. (1957) Statistical Analysis of Stationary Time Series, John Wiley, New
York City.
Hipel, K. W. and McLeod, A. I. (1994) Time Series Modelling of Water Resources and Environmental
Systems, Elsevier.
Hurd, H. (1991) Correlation theory of almost periodically correlated processes, Journal of Multivariate
Analysis, 37, 24-45.

25

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics G. Liu, Q. Shao, R. Lund and J. Woody

Lin, Z. and Liu, W. (2009) On maxima of periodograms of stationary processes, Annals of Statistics, 37,
2676-2695.
Lii, K. S. and Rosenblatt, M. (2006) Estimation for almost periodic processes, The Annals of Statistics, 34,
1115-1139.
Lund, R., Hurd, H., Bloomfield, P., and Smith, R. L. (1995) Climatological time series with periodic
correlation, Journal of Climate, 11, 2787-2809.
Lund, R., Seymour, P. L., and Kafadar, K. (2001) Temperature trends in the United States, Environmetrics,
12, 673-690.
Lund, R., Liu, G. and Shao, Q. (2015) A new approach to ANOVA methods with autocorrelated data,
revision in review, The American Statistician.
Menne, M. J. and Williams Jr., C. N. (2005) Detection of undocumented changepoints using multiple test
statistics and composite reference series, Journal of Climate, 18, 4271-4286.
Menne, M. J. and Williams Jr., C. N. (2009) Homogenization of temperature series via pairwise comparisons,
Journal of Climate, 22, 1700-1717.
Mitchell, J. M. (1953) On the causes of instrumentally observed secular temperature trends, Journal of
Meteorology, 10, 244-261.
Novick, S. J. (1994) Analysis of Fisher’s Test for Hidden Periodicities, Dissertation, Leigh University.
Schminder, E, Ziegler, M., Danay, E., Beyer, L. and Bühner, M. (2010) Is it really robust? Reinvestigating
the robustness of ANOVA against violations of the normal distribution, European Research Journal of
Methods for the Behavioral and Social Sciences, 6, 147-151.
Shao, Q. and Ni, P. P. (2004) Least-squares estimation and ANOVA for periodic autoregressive time series,
Statistics & Probability Letters, 69, 287–297.
Shumway, R. H. and Stoffer, D. S. (2010) Time Series Analysis and Its Applications with R Examples, Third
Edition, Springer, New York City.
Sutradhar B. C. and Bartlett, R. F. (1993) A small and large sample comparison of Wald’s likelihood ratio
and Rao’s tests for testing linear regression with autocorrelated errors, The Indian Journal of Statistics,
55, 186-198.
Sutradhar B. C., MacNeil, I. B. and Dagum, E. B. (1995) A simple test for stable seasonality, Journal of
Statistical Planning and Inference, 43, 157-167.
Tjøstheim, D., and Paulsen, J. (1983). Bias of some commonly-used time series estimates, Biometrika, 70,
389-399.
R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical
Computing, Vienna, Austria. URL

26

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURES Environmetrics

Monthly Average High Temperatures: Aberdeen, Mississippi


150

100

50

0
1930 1940 1950 1960
Year
Monthly Average High Temperatures: Tuscaloosa, Alabama

100
Degrees F

50

0
1930 1940 1950 1960
Year
Monthly Average High Temperatures: Aberdeen Minus Tuscaloosa
5
Degrees F

−5
1 2 3 4 5 6 7 8 9 10 11 12
Month of Year (Jan − Dec)

Monthly Average High Temperatures: Aberdeen Minus Tuscaloosa


10

5
Degrees F

−5

−10
1930 1940 1950 1960
Year

Figure 1. Monthly average temperatures at Aberdeen, Mississippi (top panel), monthly average temperatures at Tuscaloosa, Alabama
(second panel), box plots of the average monthly differences by month (third panel), and the monthly differences themselves (bottom
panel). It is not visually evident whether a mean cycle remains in the differenced series in the bottom panel.

27

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics TABLES

Table 1. AR(1) Empirical Significance Levels

d = 10 d = 20
ϕ1
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
0.1 0.033 0.059 0.069 0.060 0.028 0.038 0.056 0.057 0.063 0.034
0.4 0.012 0.055 0.067 0.238 0.030 0.012 0.053 0.058 0.342 0.037
0.8 0.000 0.048 0.061 0.806 0.029 0.000 0.047 0.053 0.963 0.039
−0.1 0.065 0.064 0.077 0.058 0.031 0.067 0.061 0.071 0.058 0.029
−0.4 0.102 0.052 0.068 0.239 0.037 0.101 0.046 0.054 0.319 0.034
−0.8 0.380 0.042 0.086 0.813 0.042 0.373 0.041 0.062 0.962 0.041
d = 50 d = 200
ϕ1
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
0.1 0.040 0.051 0.053 0.072 0.040 0.040 0.052 0.052 0.078 0.042
0.4 0.008 0.051 0.052 0.472 0.040 0.007 0.054 0.053 0.721 0.049
0.8 0.000 0.047 0.049 0.999 0.044 0.000 0.050 0.050 1.000 0.050
−0.1 0.063 0.056 0.057 0.067 0.038 0.062 0.050 0.052 0.088 0.057
−0.4 0.113 0.051 0.054 0.472 0.036 0.108 0.052 0.054 0.719 0.046
−0.8 0.372 0.044 0.054 0.999 0.047 0.362 0.046 0.049 1.000 0.048

28

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURES Environmetrics

Table 2. AR(1) Empirical Powers

d = 10 d = 20
ϕ1
β̂raw β̂pe β̂lrt β̂S β̂Se β̂raw β̂pe β̂lrt β̂S β̂Se
0.1 0.376 0.468 0.478 0.147 0.147 0.726 0.769 0.774 0.353 0.364
0.4 0.264 0.503 0.504 0.232 0.175 0.622 0.815 0.816 0.410 0.423
0.8 0.048 0.593 0.628 0.752 0.224 0.124 0.928 0.932 0.939 0.625
−0.1 0.392 0.461 0.468 0.134 0.135 0.742 0.768 0.775 0.337 0.355
−0.4 0.415 0.520 0.526 0.228 0.177 0.729 0.827 0.827 0.389 0.429
−0.8 0.481 0.658 0.666 0.756 0.209 0.575 0.939 0.946 0.940 0.605
d = 50 d = 200
ϕ1
β̂raw β̂pe β̂lrt β̂S β̂Se β̂raw β̂pe β̂lrt β̂S β̂Se
0.1 0.991 0.993 0.993 0.868 0.874 1.000 1.000 1.000 1.000 1.000
0.4 0.990 0.999 0.999 0.887 0.935 1.000 1.000 1.000 1.000 1.000
0.8 0.689 1.000 1.000 0.998 0.991 1.000 1.000 1.000 1.000 1.000
−0.1 0.993 0.993 0.993 0.879 0.887 1.000 1.000 1.000 1.000 1.000
−0.4 0.992 0.998 0.998 0.876 0.937 1.000 1.000 1.000 0.999 1.000
−0.8 0.896 1.000 1.000 0.998 0.991 1.000 1.000 1.000 1.000 1.000

29

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics TABLES

Table 3. AR(2) Empirical Significance Levels

d = 10 d = 20
ϕ1 , ϕ2 ACVF Decay Rate
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
−0.6, 0.16 0.8 0.361 0.038 0.084 0.712 0.020 0.355 0.045 0.075 0.923 0.029
−0.5, 0.14 0.7 0.234 0.043 0.077 0.545 0.019 0.241 0.046 0.068 0.763 0.027
0.3, 0.18 0.6 0.007 0.040 0.067 0.280 0.012 0.008 0.043 0.056 0.414 0.024
−0.3, 0.1 0.5 0.117 0.044 0.075 0.208 0.017 0.117 0.050 0.067 0.294 0.022
−0.2, 0.08 0.4 0.074 0.039 0.063 0.113 0.020 0.072 0.038 0.046 0.141 0.025
−0.1, 0.06 0.3 0.064 0.042 0.071 0.066 0.015 0.061 0.043 0.058 0.077 0.020

d = 50 d = 200
ϕ1 , ϕ2 ACVF Decay Rate
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
−0.6, 0.16 0.8 0.340 0.045 0.058 0.996 0.034 0.357 0.048 0.050 1.000 0.040
−0.5, 0.14 0.7 0.246 0.044 0.054 0.935 0.039 0.231 0.051 0.054 0.999 0.043
0.3, 0.18 0.6 0.004 0.045 0.052 0.622 0.037 0.006 0.049 0.049 0.901 0.046
−0.3, 0.1 0.5 0.116 0.052 0.057 0.447 0.035 0.113 0.047 0.051 0.712 0.053
−0.2, 0.08 0.4 0.080 0.044 0.051 0.197 0.026 0.080 0.052 0.054 0.336 0.040
−0.1, 0.06 0.3 0.057 0.046 0.050 0.090 0.033 0.055 0.050 0.049 0.129 0.048

30

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIGURES Environmetrics

Table 4. AR(2) Empirical Powers

d = 10 d = 20
ϕ1 , ϕ2 ACVF Decay Rate
β̂raw β̂pe β̂lrt β̂S β̂Se β̂raw β̂pe β̂lrt β̂S β̂Se
−0.6, 0.16 0.8 0.567 0.781 0.803 0.638 0.278 0.763 0.982 0.985 0.889 0.702
−0.5, 0.14 0.7 0.563 0.743 0.750 0.458 0.238 0.856 0.970 0.970 0.741 0.641
0.3, 0.18 0.6 0.542 0.820 0.812 0.269 0.172 0.936 0.991 0.992 0.546 0.486
−0.3, 0.1 0.5 0.627 0.705 0.688 0.234 0.221 0.931 0.957 0.960 0.496 0.554
−0.2, 0.08 0.4 0.647 0.693 0.671 0.190 0.202 0.937 0.954 0.951 0.476 0.530
−0.1, 0.06 0.3 0.659 0.704 0.677 0.192 0.213 0.952 0.954 0.952 0.477 0.498

d = 50 d = 200
ϕ1 , ϕ2 ACVF Decay Rate
α̂raw β̂pe β̂lrt β̂S β̂Se β̂raw β̂pe β̂lrt β̂S β̂Se
−0.6, 0.16 0.8 0.995 1.000 1.000 0.999 0.997 1.000 1.000 1.000 1.000 1.000
−0.5, 0.14 0.7 0.999 1.000 1.000 0.992 0.997 1.000 1.000 1.000 1.000 1.000
0.3, 0.18 0.6 1.000 1.000 1.000 0.978 0.987 1.000 1.000 1.000 1.000 1.000
−0.3, 0.1 0.5 1.000 1.000 1.000 0.975 0.989 1.000 1.000 1.000 1.000 1.000
−0.2, 0.08 0.4 1.000 1.000 1.000 0.969 0.980 1.000 1.000 1.000 1.000 1.000
−0.1, 0.06 0.3 1.000 1.000 1.000 0.972 0.977 1.000 1.000 1.000 1.000 1.000

31

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics TABLES

Table 5. AR(4) Empirical Significance Levels

d = 10 d = 20
ϕ1 , ϕ2 , ϕ3 , ϕ4 ACVF Decay Rate
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
0.8, 0.07, −0.05, −0.0048 0.8 0.000 0.039 0.094 0.854 0.008 0.000 0.048 0.074 0.977 0.008
0.7, 0.07, −0.043, −0.0042 0.7 0.000 0.040 0.097 0.731 0.007 0.000 0.051 0.076 0.914 0.007
0.2, 0.41, −0.042, −0.036 0.6 0.024 0.035 0.113 0.338 0.003 0.024 0.040 0.075 0.505 0.010
0.1, 0.19, 0.011, −0.003 0.5 0.028 0.034 0.115 0.113 0.004 0.028 0.045 0.087 0.144 0.006
−0.2, 0.17, 0.018, −0.0072 0.4 0.081 0.027 0.119 0.156 0.003 0.082 0.035 0.067 0.209 0.009
0.2, 0.07, −0.008, −0.0012 0.3 0.025 0.032 0.111 0.100 0.002 0.020 0.040 0.078 0.149 0.006

d = 50 d = 200
ϕ1 , ϕ2 , ϕ3 , ϕ4 ACVF Decay Rate
α̂raw α̂pe α̂lrt α̂S α̂Se α̂raw α̂pe α̂lrt α̂S α̂Se
0.8, 0.07, −0.05, −0.0048 0.8 0.000 0.046 0.055 0.999 0.022 0.000 0.048 0.052 1.000 0.037
0.7, 0.07, −0.043, −0.0042 0.7 0.000 0.050 0.060 0.994 0.022 0.000 0.054 0.055 1.000 0.039
0.2, 0.41, −0.042, −0.036 0.6 0.019 0.042 0.056 0.724 0.025 0.021 0.046 0.051 0.956 0.038
0.1, 0.19, 0.011, −0.003 0.5 0.031 0.048 0.065 0.206 0.026 0.026 0.049 0.051 0.338 0.045
−0.2, 0.17, 0.018, −0.0072 0.4 0.088 0.041 0.057 0.299 0.019 0.088 0.057 0.060 0.515 0.035
0.2, 0.07, −0.008, −0.0012 0.3 0.017 0.046 0.059 0.187 0.020 0.016 0.046 0.048 0.306 0.043

32

This article is protected by copyright. All rights reserved.


1099095x, 2016, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/env.2383 by Readcube (Labtiva Inc.), Wiley Online Library on [08/05/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Environmetrics

0.050 −0.2874
-0.5139 -0.3261 0.2998
12
6

This article is protected by copyright. All rights reserved.


11
5

0.2651
Table 6. Seasonal Means of {∆t }

10
4
−0.6786

0.1841
3

33
−0.8075

0.9372
2

8
Sample mean −0.7751

0.7720
1

7
Month of Year

Month of Year
Sample mean
FIGURES

You might also like