You are on page 1of 10

Big Data Statistics, meeting 2: Other

regression models or generalized linear


models

8 February 2024
Binomial regression (cont’d)
■ If we want to keep the approach of relating p to
d
X
βi xi ∈ (0, 1),
i=1

we have basically two equivalent possibilities


Pd
a) We use a function h that maps i=1 βi xi to the interval (0,1);
Pd
b) Or we apply a function g to p and model g(p) by i=1 βi xi .
■ As said they are equivalent. Which one do you find more convenient/natural?
■ I find possibility a) more natural but most of the literature presents the model with
possibility b).
Pd Pd
■ Note that h( i=1 βi xi ) = p and g(p) = i=1 βi xi imply: h = g −1 .
■ Functions with the property in a) are called response functions and functions with
the property in b) are called link functions. The names given to these functions in a
particular application are based on possibility b).
■ Let us look at same examples for response and link functions (the names always
stem from the link function):
5
Binomial regress.: estimation (cont’d)
  y1   1−y1
Xd Xd
f (β1 , . . . , βd ) = Φ  βj x1j  1 − Φ  βj x1j 
j=1 j=1
×...
  yn   1−yn
d
X d
X
× Φ  βj xnj  1 − Φ  βj xnj 
j=1 j=1
  yi   1−yi
n
Y d
X d
X
= Φ  βj xij  1 − Φ  βj xij  ,
i=1 j=1 j=1

where yi ∈ {0, 1}, i = 1, . . . , n.


■ This might look scary at first glance but it is just the ’conventional’
Qn yi likelihood for
independent Zi ∼ Bernoulli(p ), 1 ≤ i ≤ n, which equals p (1 − p ) 1−yi
P i  i=1 i i
d
where we replaced pi by Φ j=1 βj xij .

15
Binomial regress.: estimation (cont’d)
In compact form the log likelihood equals
  
n
X d
X
log(f (β1 , . . . , βd )) = yi log Φ  βj xij 
i=1 j=1
  
n
X d
X
+ (1 − yi ) log 1 − Φ  βj xij  .
i=1 j=1

Differentiating w.r.t. βk , k = 1, . . . , d, we have (with φ = Φ′ )


 
n d
∂ log(f (β1 , . . . , βd )) X yi X
= P  φ βj xij  xik
∂βk d
i=1 Φ j=1 βj xij j=1
   
n d
X 1 − yi X
+ P  −φ  βj xij  xik  .
d
i=1 1 − Φ j=1 βj xij j=1

17
Generalized linear models (cont’d)
The three elements of a generalized linear model are:
1. The distribution of the response variable Y has a probability density (pdf) or
probability mass function (pmf) fθY , θ ∈ Θ, of the type
 
Y yθ − b(θ)
fθ (y) = exp − c(ψ, y) , y ∈ D,
ψ

where θ ∈ Θ and ψ are real-valued parameters, D is the support of the distribution


of Y , and b and c are real-valued functions.
Pd
2. A linear function of the explanatory variable x, i.e. j=1 βj xj ;
Pd
3. A function h that links j=1 βj xj (linear predictor) and the expected value E[Y ]
of the response.
Two often used properties that follow from the above form of the pmf or pdf are
(i) E[Y ] = b′ (θ) (b′ denotes the first derivative of b); and
(ii) Var[Y ] = ψb′′ (θ) (b′′ denotes the second derivative of b).
For the rest we assume ψ which is called dispersion parameter to be known.
32
Generalized linear models (cont’d)
Example (Bernoulli as GLM): Let us check that the binomial regression model
belongs to the generalized linear models class.
1. The pmf of a Bernoulli random variable Y is

py (1 − p)(1−y) , y ∈ {0, 1}.

We can rewrite this as

exp(log(p) · y + log(1 − p) · (1 − y)) = exp(log(p/(1 − p)) · y + log(1 − p))). (3)

This is exactly of the form on the previous slide with

θ = log(p/(1 − p)), ψ = 1, c(ψ, y) = 0, and b(θ) = − log(1/(1 + exp(θ))).

Note that b is a function of θ and to find it here we rewrote p in terms of θ and


plugged this into log(1 − p).
2. This element is always the same (we only need regressors which are used as input
for a linear function).

33
Generalized linear models (cont’d)
Example (Bernoulli as GLM (cont’d)): Finally, let us look at the third element of
GLMs for the Bernoulli distribution.
3. The expectation of the random variable Y if its probability mass function is given
in the form (3) equals
exp(θ)
E[Y ] = .
1 + exp(θ)
Pd
Linking this expectation to the linear function j=1 βj xij by using h it reads as
 
d
X exp(θ)
h βj xij  = .
1 + exp(θ)
j=1

Taking h to be the probit link this becomes


 
d
X exp(θ)
Φ βj xij  = .
1 + exp(θ)
j=1

34
Distribution theory GLMs
Above we derived the maximum likelihood estimators for a binomial and Poisson
regression. Here we look at their asymptotic properties by means of the theory for
GLMs
■ Let Y1 , . . . , Yn be independent, each with covariate vector Xi = (Xi1 , . . . , Xi1 )
and each with pmf or pdf given by
 
yi θi − b(θi )
exp − c(ψ, yi ) , yi ∈ D,
ψ
Pd Pd
where θi is related to j=1 βj xij by θi = (b′ )−1 (h( j=1 βj xij )).
■ The likelihood then equals
n Pd Pd !
Y yi (b′ )−1 (h( j=1 βj xij )) − b((b′ )−1 (h( j=1 βj xij )))
exp − c(ψ, yi ) .
ψ
i=1

M LE
■ The MLE β̂ = (β̂1M LE , . . . , β̂dM LE ) is obtained by differentiating the log of
this expression w.r.t. β1 , . . . , βd .

36
Distribution theory GLMs
We have the following result
■ Theorem Under some regularity conditions1 we have
◆ β M LE is consistent;
◆ Sn (β M LE − β) has approximately a N (0, Id×d ) distribution. Here Sn is the
square root of
(XnT Ŵn Xn ),
where Ŵn is an n × n diagonal matrix with
 2
h′ (β̂1M LE xi1 + . . . + β̂dM LE xid )
wii = ,
σ̂i2

with σ̂i2 an estimator of the variance of Yi ; see Exercise sheet 2.

1
see Fahrmeir and Kaufmann (1985)

37
Distribution theory GLMs
It is very instructive to compare the result on the previous slide with the results we
discussed in Lecture 1.
■ For the normal distribution as said above we have h(x) = x so that h′ (x) = 1.
Then the matrix Ŵn becomes the identity matrix multiplied by 1 divided by an
estimator for σ 2 . This is exactly what we had in Lecture 1.
■ The only difference between the result on the previous slide and the results of
Lecture 1 is the appearance of Ŵn which has its roots in the use of a link function.
Here as in Lecture 1 the distribution on the previous slide provides a way to test, for
instance,
H0 : βℓ = 0 against H1 : βℓ 6= 0.
based on the t-statistic β̂ℓ − 0
Tℓ = q ,
(XnT Ŵn Xn )−1 )ℓ ℓ

where ((XnT Ŵn Xn )−1 )ℓ ℓ is the (ℓ, ℓ) element of the matrix (XnT Ŵn Xn )−1 .

38

You might also like