Professional Documents
Culture Documents
Akaike Information Criterion
Akaike Information Criterion
Shuhua Hu
Center for Research in Scientific Computation
North Carolina State University
Raleigh, NC
-1-
background
Background
Model
-2-
background
H Estimation error
:
:
2 = k k2 + k k
2
k k
| {z } | {z }
variance
bias
parameter vector for the full reality model.
is the projection of onto the parameter space of the approximating model k .
the maximum likelihood estimate of in k .
I Variance
2 assym.
For sufficiently large sample size n, we have nk k
2k , where E(2k ) = k.
-3-
background
-4-
K-L information
Kullback-Leibler Information
Information lost when approximating model is used to approximate the full reality.
Continuous Case
f (x)
I(f, g(|)) =
f (x) log
dx
g(x|)
Z
Z
f (x) log(f (x))dx
f (x) log(g(x|))dx
=
|
{z
}
relative K-L information
Z
H I(f, g) 6= I(g, f ), which implies K-L information is not the real distance.
-5-
AIC
I (y):
estimator of . It is a random variable.
I I(f, g(|(y)))
is a random variable.
H Remark
-6-
AIC
Selection Target
H Ey [I(f, g(|(y)))]
=
f (y)
f (x) log(g(x|(y)))dx
dy .
{z
}
Ey Ex [log(g(x|(y)))]
Maximizing Ey Ex[log(g(x|(y)))]
gG
-7-
AIC
Key Result
log(L(|y))
k
H L: likelihood function.
maximum likelihood estimate of .
H :
H k: number of estimated parameters (including the variance).
Remark
H Good model : the model that is close to f in the sense of having a small K-L value.
-8-
AIC
bias
2k
&
variance
H Calculate AIC value for each model with the same data set, and the best model is
the one with minimum AIC value.
H The value of AIC depends on data y, which leads to model selection uncertainty.
Least-Squares Case
RSS
+ 2k
AIC = n log
n
H RSS is estimated residual of fitted model.
-9-
TIC
Maximizing Ey Ex[log(g(x|(y)))]
gG
Key Result
log(L(|y))
tr(J(0)I(0)1)
h
T i
Remark
-10-
TIC
TIC
I(
1),
b )[
b )]
T IC = 2 log(L(|y))
+ 2tr(J(
and J(
are both k k matrix, and
b )
b )
where I(
log(g(x|))
b
estimate of I(0)
I() =
2
ih
iT
n h
P
=
b )
estimate of J(0)
J(
log(g(xi |))
log(g(xi |))
i=1
Remark
H Attractive in theory.
H Rarely used in practice because we need a very large sample size to obtain good estimates
for both I(0) and J(0).
-11-
AICc
Assumption: i.i.d normal error distribution with the truth contained in the model set.
AICc = AIC +
2k(k + 1)
k 1}
|n {z
bias-correction
Remark
H The bias-correction term varies by type of model (e.g., normal, exponential, Poisson).
H In practice, AICc is generally suitable unless the underlying probability distribution is
extremely nonnormal, especially in terms of being strongly skewed.
-12-
AICc
Multivariate Case
-13-
AIC difference
Useful in making inference concerning the relative strength of evidence for each of the
models in the set
1
L(gi|y) exp i , where means is proportional to.
2
Akaike Weights
Weight of evidence in favor of model i being the best approximating model in the set
exp( 12 i)
wi = PR
1
r=1 exp( 2 r )
-14-
confidence set
-15-
multimodel inference
Multimodel Inference
Unconditional Variance Estimator
=
var(
c )
" R
X
i=1
wi
2
var(
c i|gi) + (i )
#2
H is a model-averaged estimate =
wii.
i=1
Remark
H Unconditional means not conditional on any particular model, but still conditional on
the full set of models considered.
H If is a parameter in common to only a subset of the R models, then wi must be recalP
culated based on just these models (thus these new weights must satisfy
wi = 1).
H Use unconditional variance unless the selected model is strongly supported (for example,
wmin > 0.9).
March 15, 2007
-16-
summary
-17-
summary
H Should use the same response variables for all the candidate models.
For example, if there was interest in the normal and log-normal model forms, the models
would have to be expressed, respectively, as
1
1
(x )2
(log(x) )2
exp
exp
g1(x|, ) =
, g2(x|, ) =
,
2
2
2 2
2
x 2
instead of
2
2
1
(x )
1
(log(x) )
exp
exp
g1(x|, ) =
,
g
(log(x)|,
)
=
.
2
2 2
2 2
2
2
-18-
reference
References
[1] E.J. Bedrick and C.L. Tsai, Model Selection for Multivariate Regression in Small Samples,
Biometrics, 50 (1994), 226231.
[2] H. Bozdogan, Model Selection and Akaikes Information Criterion (AIC): The General Theory
and Its Analytical Extensions, Psychometrika, 52 (1987), 345370.
[3] H. Bozdogan, Akaikes Information Criterion and Recent Developments in Information Complexity, Journal of Mathematical Psychology, 44 (2000), 6291.
[4] K.P. Burnham and D.R. Anderson, Model Selection and Inference:
Information-Theoretical Approach, (1998), New York: Springer-Verlag.
A Practical
[5] K.P. Burnham and D.R. Anderson, Multimodel Inference: Understanding AIC and BIC in
Model Selection, Sociological methods and research, 33 (2004), 261304.
[6] C.M. Hurvich and C.L. Tsai, Regression and Time Series Model Selection in Small Samples,
Biometrika, 76 (1989).
-19-