You are on page 1of 8

Statistics 512 Notes 20:

Multinomial Distribution:
Consider a random trial which can result in one, and only
one, of k outcomes or categories
1
, ,
k
C C K
. Let
1 1
, ,
k
p p

K

denote the probabilities of outcomes
1 1
, ,
k
C C

K
; the
probability of
k
C
is
1 1
1
k k
p p p

K
. Let
i
X

denote the outcome of the ith trial. Let
1
, ,
i ik
Z Z K
be
indicator variables for whether the ith trial resulted in the
1,...,kth outcome respectively, e.g.,
1 1 1
1 if , 0 otherwise
i i i
Z X C Z
. Let
1
n
j ij
i
Y Z

,
1, , j k K
denote the number of trials whose outcome is
j
C
.
We have
, 1
1 2
1, 1 , 1
11 1 1
1 1
1 1 2 1 1 1
1
1 1 1 1
1 1 1 1
( , , ) (1 )
(1 )
(1 )
i k
i i ik
k n k
n k nk
k k
n
Z
Z Z Z
n k k
i
Z Z
Z Z Z Z
k k
Y Y Y
k k
P X X p p p p p
p p p p
p p p p

+
+ + + +




L
L L
K L L
L L
L L
Note that
1 1
1 1 1 1 1 1
1
( , , ) (1 )
k k
y y y
k k k k
k
n
P Y y Y y p p p p
y y


_


,
K L L
L
where
1 1
!
! !
k k
n
n
y y y y
_


,
L L
.
The likelihood is
1 1 1 1 1 1 1 1 1 1
( , ) log log ( ) log(1 )
k k k k k
l p p Y p Y p n Y Y p p

+ + + K L L L
The partial derivatives are:
1 1 1
1 1 1 1
1
k
k
n Y Y Y l
p p p p


+

L
L
,...,
1 1 1
1 1 1 1
1
k k
k k k
Y n Y Y l
p p p p



+

L
L
It is easily seen that
,

j
j MLE
Y
p
n
satisfies these equations.
See (6.4.19) and (6.4.20) in book for information matrix.
Goodness of fit tests for multinomial experiments:
For a multinomial experiment, we often want to consider a
model with fewer parameters than
1 1
, ,
k
p p

K
, e.g.,
1 1 1 1 1 1
( , , ), , ( , , )
q k k q
p f p f

K K K
(*)
where
1 q k <
and 1
( , , )
q
K
To test if the model is appropriate, we can do a goodness
of fit test which tests
0 1 1 1 1 1 1 1
: ( , , ), , ( , , ) for some ( , , )
q k k q q
H p f p f

K K K K
vs.
0 1
: is not true for any ( , , )
a q
H H K
We can do this test using a likelihood ratio test where the
number of extra parameters in the full parameter space is
( 1) k q
so that
2
2log (( 1) )
D
k q
under
0
H
.
Example 2: Linkage in genetics
Corn can be starchy (S) or sugary (s) and can have a green
base leaf (G) or a white base leaf (g). The traits starchy and
green base leaf are dominant traits. Suppose the alleles for
these two factors occur on separate chromosomes and are
hence independent. Then each parent with alleles SsGg
produces with equal likelihood gametes of the form (S,G),
(S,g), (s,G) and (s,g). If two such hybrid parents are
crossed, the phenotypes of the offspring will occur in the
proportions suggested by the able below. That is, the
probability of an offspring of type (S,G) is 9/16; type (SG)
is 3/16; type (S,g) 3/16; type (s,g) 1/16.
Alleles of first parent
Alleles
of
second
parent
SG Sg sG sg
SG (S,G) (S,G) (S,G) (S,G)
Sg (S,G) (S,g) (S,G) (s,G)
sG (S,G) (S,G) (s,G) (s,G)
Sg (S,G) (S,g) (s,G) (s,g)
The table below shows the results of a set of 3839 SsGg x
SsGg crossings (Carver, 1927, Genetics, A Genetic Study
of Certain Chlorophyll Deficiencies in Maize.)
Phenotype Number in sample
Starchy green 1997
Starchy white 906
Sugary green 904
Sugary white 32
Does the genetic model with 9:3:3:1 ratios fit the data?
Let
i
X
denote the phenotype of the ith crossing.
Model:
1
, ,
n
X X K
are iid multinomial.
( ) , ( ) , ( ) , ( )
i SG i Sg i sG i sg
P X SG p P X Sg p P X sG p P X sg p
0
1
: 9/16, 3/16, 3/16, 1/16
: At least one of 9/16, 3/16, 3/16, 1/16 is not correct.
SG Sg sG sg
SG Sg sG sg
H p p p p
H p p p p


Likelihood ratio test:
1997 906 904 32
1997 906 904 32
max ( ) (9/16) (3/16) (3/16) (1/16)
max ( ) (1997/ 3839) (906/ 3839) (904/ 3839) (32/ 3839)
L
L


9/16 3/16
2log 2*(1997log 906log
1997/ 3839 906/ 3839
3/16 1/16
904log 32log ) 387.51
904/3839 3839
+ +
+
Under
0
: 9/16, 3/16, 3/16, 1/16
SG Sg sG sg
H p p p p
,
2
2log ~ (3) [there are three extra free parameters in
1
H
].
Reject
0
H
if
2
.05
2log (3) 7.81
.
Thus we reject
0
: 9/16, 3/16, 3/16, 1/16
SG Sg sG sg
H p p p p
.
Model for linkage:
1 1 1 1
(2 ), (1 ), (1 ),
4 4 4 4
SG Sg sG sg
p p p p +
1 2 3 1 2 3
1 1 1 1
( ) (2 ) (1 ) (1 )
4 4 4 4
Y Y Y n Y Y Y
L

_ _ _ _
+

, , , ,
Maximum likelihood estimate of for corn data = 0.0357,
see handout.
Test
0
1
1 1 1 1
: (2 ), (1 ), (1 ), vs.
4 4 4 4
: , , , do not satisfy
1 1 1 1
(2 ), (1 ), (1 ),
4 4 4 4
for any ,0 1
SG Sg sG sg
SG Sg sG sg
SG Sg sG sg
H p p p p
H p p p p
p p p p



+
+

1997 906 904 32
1997 906 904 32
max ( ) (.25*(2 .0357)) (.25*(1 .0357)) (.25*(1 .0357 )) (.25*.0357)
max ( ) (1997/ 3839) (906/ 3839) (904/ 3839) (32/ 3839)
L
L

+

2log 2.02
Under
0
H
,
2
2log ~ (2) [there are two extra free
parameters in
1
H
].
2
.05
2log (2) 5.99 <
Linkage model is not rejected.
Sufficiency
Let
1
, ,
n
X X K
denote the a random sample of size n from a
distribution that has pdf or pmf
( ; ) f x
. The concept of
sufficiency arises as an attempt to answer the following
question: Is there a statistic, a function
1
( , , )
n
Y u X X K
which contains all the information in the sample about .
If so, a reduction of the original data to this statistic without
loss of information is possible. For example, consider a
sequence of independent Bernoulli trials with unknown
probability of success . We may have the intuitive
feeling that the total number of successes contains all the
information about that there is in the sample, that the
order in which the successes occurred, for example, does
not give any additional information. The following
definition formalizes this idea:
Definition: Let
1
, ,
n
X X K
denote the a random sample of
size n from a distribution that has pdf or pmf
( ; ) f x
,
. A statistic
1
( , , )
n
Y u X X K
is said to be sufficient
for if the conditional distribution of
1
, ,
n
X X K
given
Y y
does not depend on for any value of
y
.
Example 1: Let
1
, ,
n
X X K
be a sequence of independent
Bernoulli random variables with
( 1)
i
P X
. We will
verify that
1
n
i
i
Y X

is sufficient for .
Consider
1 1
( , , | )
n n
P X x X x Y y

K
For
1
n
i
i
y x

, the conditional probability is 0 and does


not depend on .
For
1
n
i
i
y x

,
1 1
1 1
( , , , )
( , , | )
( )
(1 ) 1

(1 )
n n
n n
y n y
y n y
P X x X x Y y
P X x X x Y y
P Y y
n n
y y


_ _


, ,
K
K
The conditional distribution thus does not involve at all
and thus
1
n
i
i
Y X

is sufficient for .
Example 2:
Let
1
, ,
n
X X K
be iid Uniform(
0,
). Consider the statistic
1
max
i n i
Y X

.
We have shown before (see Notes 1) that
1
0
( )
0 elsewhere
n
n
Y
ny
y
f y

'

For
Y <
, we have
1 1
1 1
1 1
( , , | )
( , , | )
( )
1
1

n n
n n
Y n
n n
Y n
P X x X x Y y
P X x X x Y y
P Y y
I
ny ny
I


K
K
which does not depend on .
For Y < ,
1 1
( , , | ) 0
n n
P X x X x Y y

K
.
Thus, the conditional distribution does not depend on
and
1
max
i n i
Y X

is a sufficient statistic
Factorization Theorem: Let
1
, ,
n
X X K
denote the a random
sample of size n from a distribution that has pdf or pmf
( ; ) f x
, . A statistic
1
( , , )
n
Y u X X K
is sufficient
for if and only if we can find two nonnegative functions,
1
k
and
2
k
such that
1 1 1 2 1
( ; ) ( ; ) [ ( , , ); ] [( , , )]
n n n
f x f x k u x x k x x L K K
where
2 1
[( , , )]
n
k x x K
does not depend upon .

You might also like