P. 1
Finite sample criteria for autoregressive model order selection

# Finite sample criteria for autoregressive model order selection

|Views: 4|Likes:
This document details autoregressive model selection criteria following Broersen. Emphasis is placed on converting the formulas into forms efficiently computable when evaluating a single model. When evaluating a hierarchy of models, computing using intermediate results may be more efficient.
This document details autoregressive model selection criteria following Broersen. Emphasis is placed on converting the formulas into forms efficiently computable when evaluating a single model. When evaluating a hierarchy of models, computing using intermediate results may be more efficient.

Categories:Types, Presentations
Published by: RhysU on Sep 26, 2013

### Availability:

Read on Scribd mobile: iPhone, iPad and Android.
See more
See less

09/21/2015

pdf

text

original

# Finite sample criteria for autoregressive order selection

This document details autoregressive model selection criteria following Broersen.
1
Emphasis
is placed on converting the formulas into forms eﬃciently computable when evaluating a
single model. When evaluating a hierarchy of models, computing using intermediate results
may be more eﬃcient.
Setting
An AR(K) process and its AR(p) model are given by
x
n
+ a
1
x
n−1
+· · · + a
K
x
n−K
=
n
x
n
+ ˆ a
1
x
n−1
+· · · + ˆ a
p
x
n−p
= ˆ
n
in which
n
∼ N (0, σ
2

) and ˆ
n
∼ N (0, ˆ σ
2

). Model selection criteria for evaluating which of
several candidates most parsimoniously ﬁts an AR(K) process generally have the form
criterion(v
method
, N, p, α) = ln residual(p, v
method
) + overﬁt(criterion, v
method
, N, p, α) . (2)
Among all candidates and using a given criterion, the “best” model minimizes the criterion.
Here, N represents the number of samples used to estimate model parameters, p denotes
the order of the estimated model, v
method
= v
method
(N, i) is the method-speciﬁc estimation
variance for model order i, and α is an optional factor with a criterion-dependent meaning.
When estimating ˆ a
1
, . . . , ˆ a
p
given sample data x
n
, the residual variance is
residual(v
method
, p) = residual(p) = ˆ σ
2

.
Therefore the left term in (2) penalizes misﬁtting the data independently of the estimation
method used. One may therefore distinguish among criterion using only the overﬁtting
penalty term, namely overﬁt(criterion, v
method
, N, p, α).
In Broersen’s work, the penalty term depends upon the model estimation method used
through the estimation variance v:
v
Yule–Walker
(N, i) =
N −i
N (N + 2)
i = 0
v
Burg
(N, i) =
1
N + 1 −i
i = 0
v
LSFB
(N, i) =
1
N + 1.5 −1.5i
i = 0
v
LSF
(N, i) =
1
N + 2 −2i
i = 0
1
Broersen, P. M. T. “Finite sample criteria for autoregressive order selection.” IEEE Transactions on
Signal Processing 48 (December 2000): 3550-3558. http://dx.doi.org/10.1109/78.887047.
1
Here “LSFB” and “LSF” are shorthand for least squares estimation minimizing both the
forward and backward prediction or only the forward prediction, respectively. The estimation
variance for i = 0 depends only on whether or not the sample mean has been subtracted:
v(N, 0) =
1
N
sample mean subtracted
v(N, 0) = 0 sample mean retained
Inﬁnite sample overﬁt penalty terms
The method-independent generalized information criterion (GIC) has overﬁtting penalty
overﬁt(GIC, N, p, α) = α
p
N
independent of v
model
. The Akaike information criterion (AIC) has
overﬁt(AIC, N, p) = overﬁt(GIC, N, p, 2) (5)
while the consistent criterion BIC and minimally consistent criterion (MCC) have
overﬁt(BIC, N, p) = overﬁt(GIC, N, p, ln N) (6)
overﬁt(MCC, N, p) = overﬁt(GIC, N, p, 2 ln ln N) . (7)
Additionally, Broersen uses α = 3 with GIC referring to the result as GIC(p,3). The
asymptotically-corrected Akaike information criterion (AIC
C
) of Hurvich and Tsai
2
is
overﬁt(AIC
C
, N, p) =
2p
N −p −1
.
Finite sample overﬁt penalty terms
Finite information criterion
3
The ﬁnite information criterion (FIC) is an extension of GIC meant to account for ﬁnite
sample size and the estimation method employed. The FIC overﬁt penalty term is
overﬁt(FIC, v
method
, N, p, α) = α
p

i=0
v
method
(N, i)
= α
_
v(N, 0) +
p

i=1
v
method
(N, i)
_
2
Hurvich, Cliﬀord M. and Chih-Ling Tsai. ”Regression and time series model selection in small samples.”
Biometrika 76 (June 1989): 297-307. http://dx.doi.org/10.1093/biomet/76.2.297
3
FIC is mistakenly called the “ﬁnite sample information criterion” on page 3551 of Broersen 2000 but
referred to correctly as the “ﬁnite information criterion” on page 187 of Broersen’s 2006 book.
2
where v(N, 0) is evaluated using (4) and v
method
(N, i) from (3). The factor α may be chosen
as in (5), (6), or (7). Again, Broersen uses α = 3 calling the result FIC(p,3).
By direct computation one ﬁnds the following:
overﬁt(FIC, v
Yule–Walker
, N, p, α) = α
_
v(N, 0) −
p (1 −2N + p)
2N (N + 2)
_
overﬁt(FIC, v
Burg
, N, p, α) = α(v(N, 0) −ψ(N + 1) + ψ(N + 1 −p))
overﬁt(FIC, v
LSFB
, N, p, α) = α
_
v(N, 0) −
2
3
_
ψ
_
3 + 2N
3
_
−ψ
_
3 + 2N
3
−p
___
overﬁt(FIC, v
LSF
, N, p, α) = α
_
v(N, 0) −
1
2
_
ψ
_
2 + N
2
_
−ψ
_
2 + N
2
−p
___
The simpliﬁcations underneath the Burg, LSFB, and LSF results use that
p

i=1
1
N + a −ai
=
p−1

i=0
1
N −ai
=
1
a
p−1

i=0
1
N
a
−i
=
1
a
_
ψ
_
N
a
+ 1
_
−ψ
_
N
a
−p + 1
__
holds ∀a ∈ R because the digamma function ψ telescopes according to
ψ(x + 1) =
1
x
+ ψ(x) =⇒ ψ(x + k) −ψ(x) =
k−1

i=0
1
x + i
.
For strictly positive abscissae, ψ may be numerically evaluated following Bernardo.
4
Finite sample information criterion
The ﬁnite sample information criterion (FSIC) is a ﬁnite sample approximation to the
Kullback–Leibler discrepancy
5
. FSIC has the overﬁt penalty term
overﬁt(FSIC, v
method
, N, p) =
p

i=0
1 + v
method
(N, i)
1 −v
method
(N, i)
−1
=
1 + v(N, 0)
1 −v(N, 0)
·
p

i=1
1 + v
method
(N, i)
1 −v
method
(N, i)
−1. (9)
4
Bernardo, J. M. “Algorithm AS 103: Psi (digamma) function.” Journal of the Royal Statistical Society.
Series C (Applied Statistics) 25 (1976). http://www.jstor.org/stable/2347257
5
Presumably FSIC could be related, through the Kullback symmetric divergence, to the KICc and AKICc
criteria proposed by Seghouane, A. K. and M. Bekara. “A Small Sample Model Selection Criterion Based on
Kullback’s Symmetric Divergence.” IEEE Transactions on Signal Processing 52 (December 2004): 3314-3323.
http://dx.doi.org/10.1109/TSP.2004.837416.
3
The product in the context of the Yule–Walker estimation may be reexpressed as
p

i=1
1 + v
Yule–Walker
(N, i)
1 −v
Yule–Walker
(N, i)
=
p

i=1
N
2
+ 3N −i
N
2
+ N + i
= (−1)
p
(1 −3n −n
2
)
p
(1 + n −n
2
)
p
=
(n
2
+ 3n −p)
p
(1 + n −n
2
)
p
(10)
where the “rising factorial” is denoted by the Pochhammer symbol
(x)
k
=
Γ(x + k)
Γ(x)
.
When x is a negative integer and Γ is therefore undeﬁned, the limiting value of the ratio is
implied. The product in the context of the Burg, LSFB, or LSF estimation methods becomes
p

i=1
1 + v
Burg|LSFB|LSF
(N, i)
1 −v
Burg|LSFB|LSF
(N, i)
=
p

i=1
N + a (1 −i) + 1
N + a (1 −i) −1
=
_

1+N
a
_
p
_
1−N
a
_
p
(11)
where a ∈ R is a placeholder for a method-speciﬁc constant. Routines for computing the
Pochhammer symbol may be found in, for example, SLATEC
6
or the GNU Scientiﬁc Li-
brary
7
. In particular, both suggested sources handle negative integer input correctly.
By direct substitution of (10) or (11) into (9) one obtains:
overﬁt(FSIC, v
Yule–Walker
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
(n
2
+ 3n −p)
p
(1 + n −n
2
)
p
−1
overﬁt(FSIC, v
Burg
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
(−1 −N)
p
(1 −N)
p
−1
overﬁt(FSIC, v
LSFB
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
_
−2−2N
3
_
p
_
2−2N
3
_
p
−1
overﬁt(FSIC, v
LSF
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
_
−1−N
2
_
p
_
1−N
2
_
p
−1
Combined information criterion
The combined information criterion (CIC) takes the behavior of FIC(p,3) at low orders and
FSIC at high orders. For any estimation method CIC has the overﬁt penalty term
overﬁt(CIC, v
method
, N, p) = max
_
overﬁt(FSIC, v
method
, N, p) ,
overﬁt(FIC, v
method
, N, p, 3)
_
.
6
Vandevender, W. H. and K. H. Haskell. “The SLATEC mathematical subroutine library.” ACM
SIGNUM Newsletter 17 (September 1982): 16-21. http://dx.doi.org/10.1145/1057594.1057595
7
M. Galassi et al, GNU Scientiﬁc Library Reference Manual (3rd Ed.), ISBN 0954612078. http://www.
gnu.org/software/gsl/
4

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->