You are on page 1of 3

18.

650 - Likelihood function

This note aims to give a more general definition of the likelihood function.
In class, we saw how to define the likelihood associated with a statistical
model in two cases: when the observations are continuous and when they are
discrete. In general, the data do not fall in one of these cases. For example,
in the logistic regression, the data are i.i.d. pairs (Xi , Yi ), i = 1, . . . , n, where
the Xi s may be continuous random variables, whereas the Yi s are Bernoulli
random variables. In addition, we have always assumed that the data are
i.i.d. However, sometimes, they may not be identically distributed. This is
the case, for example, in the linear regression with deterministic design:
Yi = Xi + "i , i = 1, . . . , n,

where the Xi s are fixed (deterministic) and known covariates and the noise
terms "i are i.i.d. In this case, the data are Y1 , . . . , Yn , which are independent
but not identically distributed (for instance, they do not have the same expectation). More generally, the data may not even be independent ! This is
the case in heteroscedastic linear regression: the noise terms "i are no longer
assumed to be i.i.d., but instead, the vector " = ("1 , . . . , "n ) is assumed to be
centered Gaussian with covariance matrix that is not necessarily diagonal.
In all those cases, it is still possible to define the likelihood function.
Let us set some notation. The sample is denoted by Z1 , . . . , Zn (in the
first chapters, we usually used the letter X instead of Z, but it is more
convenient to use Z in this note). For example, the Zi s could be Bernoulli
random variables, or exponential random variables, or each Zi could be a
pair (Xi , Yi ) (e.g., in the regression setup). The space of the data is denoted
by E: Zi E, i = 1, . . . , n (e.g., E = Rp R for the linear regression, E = R
for a sample of Gaussian random variables, etc...) and is the parameter
space. Recall that the parameter is what we aim to estimate (in the linear
regression, the parameter space is Rp and the parameter is ).
Then, the likelihood is defined as the following function:
Ln

En
R
(z1 , . . . , zn , )
(z1 , . . . , zn ),
1

where is the joint probability distribution function (pdf) of Z1 , . . . , Zn ,


given that the parameter is . If Z1 , . . . , Zn are assumed to be independent,
then
(z1 , . . . , zn ) is nothing but the product of the pdfs of each Zi , i =
1, . . . , n.
The probability distribution function
If Z is a discrete random variable on a discrete set E, then its pdf is
the function
Z E R
z P[Z = z].
If Z is a continuous random variable, then its pdf is its density.

If Z = (X, Y ), then its pdf (denoted by X,Y ) is also called the joint
pdf of X and Y . If X and Y are independent, then their joint pdf is
the product of the pdfs of X and Y (denoted by X and Y ):
Z (x, y) = X (x)Y (y).

If X and Y are not independent, one can use Bayes rule, that states
that:
Z (x, y) = X (x)Y X=x (y),
where Y X=x is the conditional pdf of Y given X = x.

Examples

Write the likelihood function in each of the following cases.

Linear regression Yi = Xi +"i , i = 1, . . . , n with deterministic covariates


and i.i.d. Gaussian noise terms with unknown variance 2 , when the
parameter of interest is ( , 2 ). Additional question: Compute the
MLE ( , 2 ).
Heteroscedastic linear regression with deterministic design:
Yi = Xi + "i , i = 1, . . . , n

where " = ("1 , . . . , "n ) Nn (0, ), when the parameter is just

Heteroscedastic linear regression with random design:


Yi = Xi + "i , i = 1, . . . , n

where the Xi s are i.i.d. continuous random vectors with given density
f and, for all x Rp , the conditional distribution of " given X = x is
Nn (0, (x)), for some known matrix valued function (x), when the
parameter is just . Additional question: Compute the MLE .
2

Logistic regression with deterministic design: Y1 , . . . , Yn are independent and


Yi Ber (Xi ) , i = 1, . . . , n,
where (u) =

eu
, u R.
1 + eu

Logistic regression with random design: (X1 , Y1 ), . . . , (Xn , Yn ) are


i.i.d. random pairs, where X1 is a continuous random variable in Rp
with some density g and for x Rp , the conditional distribution of Y1
given X1 = x is Bernoulli with parameter (Xi ).