P. 1
Finite sample criteria for autoregressive model order selection

Finite sample criteria for autoregressive model order selection

|Views: 4|Likes:
Published by RhysU
This document details autoregressive model selection criteria following Broersen. Emphasis is placed on converting the formulas into forms efficiently computable when evaluating a single model. When evaluating a hierarchy of models, computing using intermediate results may be more efficient.
This document details autoregressive model selection criteria following Broersen. Emphasis is placed on converting the formulas into forms efficiently computable when evaluating a single model. When evaluating a hierarchy of models, computing using intermediate results may be more efficient.

More info:

Categories:Types, Presentations
Published by: RhysU on Sep 26, 2013
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/21/2015

pdf

text

original

Finite sample criteria for autoregressive order selection

This document details autoregressive model selection criteria following Broersen.
1
Emphasis
is placed on converting the formulas into forms efficiently computable when evaluating a
single model. When evaluating a hierarchy of models, computing using intermediate results
may be more efficient.
Setting
An AR(K) process and its AR(p) model are given by
x
n
+ a
1
x
n−1
+· · · + a
K
x
n−K
=
n
x
n
+ ˆ a
1
x
n−1
+· · · + ˆ a
p
x
n−p
= ˆ
n
in which
n
∼ N (0, σ
2

) and ˆ
n
∼ N (0, ˆ σ
2

). Model selection criteria for evaluating which of
several candidates most parsimoniously fits an AR(K) process generally have the form
criterion(v
method
, N, p, α) = ln residual(p, v
method
) + overfit(criterion, v
method
, N, p, α) . (2)
Among all candidates and using a given criterion, the “best” model minimizes the criterion.
Here, N represents the number of samples used to estimate model parameters, p denotes
the order of the estimated model, v
method
= v
method
(N, i) is the method-specific estimation
variance for model order i, and α is an optional factor with a criterion-dependent meaning.
When estimating ˆ a
1
, . . . , ˆ a
p
given sample data x
n
, the residual variance is
residual(v
method
, p) = residual(p) = ˆ σ
2

.
Therefore the left term in (2) penalizes misfitting the data independently of the estimation
method used. One may therefore distinguish among criterion using only the overfitting
penalty term, namely overfit(criterion, v
method
, N, p, α).
In Broersen’s work, the penalty term depends upon the model estimation method used
through the estimation variance v:
v
Yule–Walker
(N, i) =
N −i
N (N + 2)
i = 0
v
Burg
(N, i) =
1
N + 1 −i
i = 0
v
LSFB
(N, i) =
1
N + 1.5 −1.5i
i = 0
v
LSF
(N, i) =
1
N + 2 −2i
i = 0
1
Broersen, P. M. T. “Finite sample criteria for autoregressive order selection.” IEEE Transactions on
Signal Processing 48 (December 2000): 3550-3558. http://dx.doi.org/10.1109/78.887047.
1
Here “LSFB” and “LSF” are shorthand for least squares estimation minimizing both the
forward and backward prediction or only the forward prediction, respectively. The estimation
variance for i = 0 depends only on whether or not the sample mean has been subtracted:
v(N, 0) =
1
N
sample mean subtracted
v(N, 0) = 0 sample mean retained
Infinite sample overfit penalty terms
The method-independent generalized information criterion (GIC) has overfitting penalty
overfit(GIC, N, p, α) = α
p
N
independent of v
model
. The Akaike information criterion (AIC) has
overfit(AIC, N, p) = overfit(GIC, N, p, 2) (5)
while the consistent criterion BIC and minimally consistent criterion (MCC) have
overfit(BIC, N, p) = overfit(GIC, N, p, ln N) (6)
overfit(MCC, N, p) = overfit(GIC, N, p, 2 ln ln N) . (7)
Additionally, Broersen uses α = 3 with GIC referring to the result as GIC(p,3). The
asymptotically-corrected Akaike information criterion (AIC
C
) of Hurvich and Tsai
2
is
overfit(AIC
C
, N, p) =
2p
N −p −1
.
Finite sample overfit penalty terms
Finite information criterion
3
The finite information criterion (FIC) is an extension of GIC meant to account for finite
sample size and the estimation method employed. The FIC overfit penalty term is
overfit(FIC, v
method
, N, p, α) = α
p

i=0
v
method
(N, i)
= α
_
v(N, 0) +
p

i=1
v
method
(N, i)
_
2
Hurvich, Clifford M. and Chih-Ling Tsai. ”Regression and time series model selection in small samples.”
Biometrika 76 (June 1989): 297-307. http://dx.doi.org/10.1093/biomet/76.2.297
3
FIC is mistakenly called the “finite sample information criterion” on page 3551 of Broersen 2000 but
referred to correctly as the “finite information criterion” on page 187 of Broersen’s 2006 book.
2
where v(N, 0) is evaluated using (4) and v
method
(N, i) from (3). The factor α may be chosen
as in (5), (6), or (7). Again, Broersen uses α = 3 calling the result FIC(p,3).
By direct computation one finds the following:
overfit(FIC, v
Yule–Walker
, N, p, α) = α
_
v(N, 0) −
p (1 −2N + p)
2N (N + 2)
_
overfit(FIC, v
Burg
, N, p, α) = α(v(N, 0) −ψ(N + 1) + ψ(N + 1 −p))
overfit(FIC, v
LSFB
, N, p, α) = α
_
v(N, 0) −
2
3
_
ψ
_
3 + 2N
3
_
−ψ
_
3 + 2N
3
−p
___
overfit(FIC, v
LSF
, N, p, α) = α
_
v(N, 0) −
1
2
_
ψ
_
2 + N
2
_
−ψ
_
2 + N
2
−p
___
The simplifications underneath the Burg, LSFB, and LSF results use that
p

i=1
1
N + a −ai
=
p−1

i=0
1
N −ai
=
1
a
p−1

i=0
1
N
a
−i
=
1
a
_
ψ
_
N
a
+ 1
_
−ψ
_
N
a
−p + 1
__
holds ∀a ∈ R because the digamma function ψ telescopes according to
ψ(x + 1) =
1
x
+ ψ(x) =⇒ ψ(x + k) −ψ(x) =
k−1

i=0
1
x + i
.
For strictly positive abscissae, ψ may be numerically evaluated following Bernardo.
4
Finite sample information criterion
The finite sample information criterion (FSIC) is a finite sample approximation to the
Kullback–Leibler discrepancy
5
. FSIC has the overfit penalty term
overfit(FSIC, v
method
, N, p) =
p

i=0
1 + v
method
(N, i)
1 −v
method
(N, i)
−1
=
1 + v(N, 0)
1 −v(N, 0)
·
p

i=1
1 + v
method
(N, i)
1 −v
method
(N, i)
−1. (9)
4
Bernardo, J. M. “Algorithm AS 103: Psi (digamma) function.” Journal of the Royal Statistical Society.
Series C (Applied Statistics) 25 (1976). http://www.jstor.org/stable/2347257
5
Presumably FSIC could be related, through the Kullback symmetric divergence, to the KICc and AKICc
criteria proposed by Seghouane, A. K. and M. Bekara. “A Small Sample Model Selection Criterion Based on
Kullback’s Symmetric Divergence.” IEEE Transactions on Signal Processing 52 (December 2004): 3314-3323.
http://dx.doi.org/10.1109/TSP.2004.837416.
3
The product in the context of the Yule–Walker estimation may be reexpressed as
p

i=1
1 + v
Yule–Walker
(N, i)
1 −v
Yule–Walker
(N, i)
=
p

i=1
N
2
+ 3N −i
N
2
+ N + i
= (−1)
p
(1 −3n −n
2
)
p
(1 + n −n
2
)
p
=
(n
2
+ 3n −p)
p
(1 + n −n
2
)
p
(10)
where the “rising factorial” is denoted by the Pochhammer symbol
(x)
k
=
Γ(x + k)
Γ(x)
.
When x is a negative integer and Γ is therefore undefined, the limiting value of the ratio is
implied. The product in the context of the Burg, LSFB, or LSF estimation methods becomes
p

i=1
1 + v
Burg|LSFB|LSF
(N, i)
1 −v
Burg|LSFB|LSF
(N, i)
=
p

i=1
N + a (1 −i) + 1
N + a (1 −i) −1
=
_

1+N
a
_
p
_
1−N
a
_
p
(11)
where a ∈ R is a placeholder for a method-specific constant. Routines for computing the
Pochhammer symbol may be found in, for example, SLATEC
6
or the GNU Scientific Li-
brary
7
. In particular, both suggested sources handle negative integer input correctly.
By direct substitution of (10) or (11) into (9) one obtains:
overfit(FSIC, v
Yule–Walker
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
(n
2
+ 3n −p)
p
(1 + n −n
2
)
p
−1
overfit(FSIC, v
Burg
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
(−1 −N)
p
(1 −N)
p
−1
overfit(FSIC, v
LSFB
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
_
−2−2N
3
_
p
_
2−2N
3
_
p
−1
overfit(FSIC, v
LSF
, N, p) =
1 + v(N, 0)
1 −v(N, 0)
·
_
−1−N
2
_
p
_
1−N
2
_
p
−1
Combined information criterion
The combined information criterion (CIC) takes the behavior of FIC(p,3) at low orders and
FSIC at high orders. For any estimation method CIC has the overfit penalty term
overfit(CIC, v
method
, N, p) = max
_
overfit(FSIC, v
method
, N, p) ,
overfit(FIC, v
method
, N, p, 3)
_
.
6
Vandevender, W. H. and K. H. Haskell. “The SLATEC mathematical subroutine library.” ACM
SIGNUM Newsletter 17 (September 1982): 16-21. http://dx.doi.org/10.1145/1057594.1057595
7
M. Galassi et al, GNU Scientific Library Reference Manual (3rd Ed.), ISBN 0954612078. http://www.
gnu.org/software/gsl/
4

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->