Professional Documents
Culture Documents
8 February 2024
Binomial regression (cont’d)
■ If we want to keep the approach of relating p to
d
X
βi xi ∈ (0, 1),
i=1
15
Binomial regress.: estimation (cont’d)
In compact form the log likelihood equals
n
X d
X
log(f (β1 , . . . , βd )) = yi log Φ βj xij
i=1 j=1
n
X d
X
+ (1 − yi ) log 1 − Φ βj xij .
i=1 j=1
17
Generalized linear models (cont’d)
The three elements of a generalized linear model are:
1. The distribution of the response variable Y has a probability density (pdf) or
probability mass function (pmf) fθY , θ ∈ Θ, of the type
Y yθ − b(θ)
fθ (y) = exp − c(ψ, y) , y ∈ D,
ψ
33
Generalized linear models (cont’d)
Example (Bernoulli as GLM (cont’d)): Finally, let us look at the third element of
GLMs for the Bernoulli distribution.
3. The expectation of the random variable Y if its probability mass function is given
in the form (3) equals
exp(θ)
E[Y ] = .
1 + exp(θ)
Pd
Linking this expectation to the linear function j=1 βj xij by using h it reads as
d
X exp(θ)
h βj xij = .
1 + exp(θ)
j=1
34
Distribution theory GLMs
Above we derived the maximum likelihood estimators for a binomial and Poisson
regression. Here we look at their asymptotic properties by means of the theory for
GLMs
■ Let Y1 , . . . , Yn be independent, each with covariate vector Xi = (Xi1 , . . . , Xi1 )
and each with pmf or pdf given by
yi θi − b(θi )
exp − c(ψ, yi ) , yi ∈ D,
ψ
Pd Pd
where θi is related to j=1 βj xij by θi = (b′ )−1 (h( j=1 βj xij )).
■ The likelihood then equals
n Pd Pd !
Y yi (b′ )−1 (h( j=1 βj xij )) − b((b′ )−1 (h( j=1 βj xij )))
exp − c(ψ, yi ) .
ψ
i=1
M LE
■ The MLE β̂ = (β̂1M LE , . . . , β̂dM LE ) is obtained by differentiating the log of
this expression w.r.t. β1 , . . . , βd .
36
Distribution theory GLMs
We have the following result
■ Theorem Under some regularity conditions1 we have
◆ β M LE is consistent;
◆ Sn (β M LE − β) has approximately a N (0, Id×d ) distribution. Here Sn is the
square root of
(XnT Ŵn Xn ),
where Ŵn is an n × n diagonal matrix with
2
h′ (β̂1M LE xi1 + . . . + β̂dM LE xid )
wii = ,
σ̂i2
1
see Fahrmeir and Kaufmann (1985)
37
Distribution theory GLMs
It is very instructive to compare the result on the previous slide with the results we
discussed in Lecture 1.
■ For the normal distribution as said above we have h(x) = x so that h′ (x) = 1.
Then the matrix Ŵn becomes the identity matrix multiplied by 1 divided by an
estimator for σ 2 . This is exactly what we had in Lecture 1.
■ The only difference between the result on the previous slide and the results of
Lecture 1 is the appearance of Ŵn which has its roots in the use of a link function.
Here as in Lecture 1 the distribution on the previous slide provides a way to test, for
instance,
H0 : βℓ = 0 against H1 : βℓ 6= 0.
based on the t-statistic β̂ℓ − 0
Tℓ = q ,
(XnT Ŵn Xn )−1 )ℓ ℓ
where ((XnT Ŵn Xn )−1 )ℓ ℓ is the (ℓ, ℓ) element of the matrix (XnT Ŵn Xn )−1 .
38