You are on page 1of 20

Stat 512-513 Summary of Material

James McQueen
University of Washington
June 2013
Contents
1 Stat 512 3
1.1 The Event Identity and Conditional Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Uniform Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Transformations of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Linear Transformations in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2 Monotonic Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.3 Linear Transformations in R
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.4 Orthogonal Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.5 General Transformations in R
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.1 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5.3 Markovs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.4 Chebyshevs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.5 Slutskys Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.6 Mann-Wald Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.7 The Univariate Delta Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.8 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.9 The General Delta Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Independent rvs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8 Uniqueness Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.9 Cramer-Levy Continuity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.10 Moment Generating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.11 The Central Limit Theorem Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Stat 513 7
2.1 Properties of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 One Sample Normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Condence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Two Sample Normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Equal Variance Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Matched Pairs T-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Suciency and UMVUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.1 Suciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.2 Finding UMVUE estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5.3 Lehmann-Schee Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 Completeness, ancillarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6.1 Basus Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Exponential Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
2.7.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7.2 Natural Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7.3 Suciency and Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Likelihood and the Cramer-Rao Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8.1 Likelihood and score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8.2 Fisher Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.3 Cramer-Rao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.8.4 Eciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9 Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9.1 Denition and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9.2 Asymptotic Properties in the iid Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.9.3 Proof of Asymptotic Normality in dim() = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.10 Location and Scale Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.10.1 The location and scale model equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.10.2 Natural Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.10.3 Invariance Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.11 Bayesian Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.12 Neyman-Pearson and UMP Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.12.1 The Set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.12.2 The Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.12.3 UMP tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.12.4 Suciency and MLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.13 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.13.1 The Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.13.2 Normal Theory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.14 Chi-Squared Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.14.1 Test of t for Multinomial observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.14.2 Test of t for Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.14.3 Degrees of Freedom Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3 Distributions 16
3.1 Discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Bernoulli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.3 Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.4 Negative Binomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.5 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.6 Hyper-Geometric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.7 Discrete Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.1 Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.2 Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Continuous Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.4 Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2.5 Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.6 Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.7 Cauchy(0, 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.8 Chi-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.9 T
n
-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.10 T
m,n
-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2
1 Stat 512
1.1 The Event Identity and Conditional Randomness
The Event Identity: Let W
r
be the waiting time for r successes. Let T
n
be the total number of successes up
until time n, then we have:
[W
r
> n] = [T
n
< r]
In the case of independent Bernoulli trials W
r
NegBiT(r, p) and T
n
Bin(n, p) and in the case of the Poisson
process, W
r
Gamma(r, rate = ) and T
n
Poi().
Conditional Randomness: Suppose N
t
is a Poisson Process, and suppose also that N
t
= m (i.e. m obser-
vations occurred in the interval [0, t]). Then these m observations constitute the ordered values of m repetition of
the Unif(0, t) experiment.
So, if N
t
= m then (N
s
[N
t
= m) Bin(m, s/t) for s < t.
1.2 Uniform Order Statistics
Let U
n:k
be the kth order statistic of n Uniform(0, 1) experiments. Then let B
n
(t) be the number of U
k
less than t.
The event identity revisited:
[U
n:k
> t] = [B
n
(t) < k]
That is, the kth largest observations exceeds t if and only if less than k observations are at most t.
Using the event identity and that B
n
(t) Bin(n, t) we can derive that U
n:k
Beta(k, n k + 1) and so:
E[U
n:k
] =
k
n + 1
and V ar[U
n:k
] =
1
n + 2
k
n + 1
_
1
k
n + 1
_
Further:
Cov[U
n:i
, U
nLk
] =
1
n + 2
i
n + 1
_
1
k
n + 1
_
1.3 Convolution
Let Z X +Y . Then
F
Z
(z) =
_

_
zu

f(u, v)dvdu
=
_

_
z

f(u, w u)dwdu
=
_
z

f(u, w u)dudw
=
_
z

f
X
(u)f
Y
(w u)dudw when X Y
Then dierentiating:
f
Z
(z) =
_

f(u, z u)du
=
_

f
X
(u)f
Y
(z u)du when X Y
=
_
z
0
f
X
(u)f
Y
(z u) when X, Y 0
3
1.4 Transformations of Random Variables
1.4.1 Linear Transformations in R
If Y aX +b then
f
Y
(y) =
1
a
f
X
((y b)/a)
1.4.2 Monotonic Transformations
if Y g(X) then:
f
Y
(y) = f
X
(g
1
(y))

d
dy
g
1
(y)

= f
X
(x)

dx
dy

= f
X
(x)

dOld
dNew

1.4.3 Linear Transformations in R


n
Let Y AX . First its useful to know that det(A
1
) = 1/ det(A) when both A
1
exists and det(A) ,= 0. Then we
have:
f
Y
(y) = f
X
(A
1
y)[det(A
1
)[
If X has mean vector
X
and covariance matrix
X
then Y AX + c has mean vector A
X
+ c and covariance
matrix A
X
A
T
1.4.4 Orthogonal Transformations
Let
1
, . . . ,
n
denote n orthogonal vectors. Then consider the orthogonal transformation w = Gy where
G
_

1
.
.
.

n
_

_ =
_

11
. . .
1,n
.
.
.
.
.
.
.
.
.

n1
. . .
nn
_

_
Then G
T
G = I = GG
T
follows immediately and is the denition of G being an orthogonal matrix. Thus G
1
= G
T
and det(G) = 1 and so [ det(G
1
)[ = 1/[ det(G)[ = 1/1 = 1. Each rotation w = Gy and yG
T
w are essentially
rotations and preserve distance from the origin.
From the above section it follows that:
f
Y
(y) = f
W
(Gy) and f
W
(w) = f
Y
(G
T
w)
This is useful for when Y N(0,
2
I) that is we have n i.i.d, N(0,
2
) rvs then orthogonal transformations Gy also
produce i.i.d N(0,
2
) rvs.
1.4.5 General Transformations in R
n
Suppose y = g(x) for some invertible function g. Let J denote the Jacobian of the inverse transformation, i.e.the
determinant of the matrix of partial derivatives (dx/dy). Then:
f
Y
(y) = f
X
(g
1
(y)) [J[
1.5 Asymptotics
1.5.1 Convergence in Probability
U
n

p
U := P([U
n
U[ )
n
0
1.5.2 Convergence in distribution
If U
n
has df F
n
and U has df F then
U
n

d
U := F
n
F
4
1.5.3 Markovs Inequality
P([X[ t) E[X[/t
1.5.4 Chebyshevs Inequality
P([X [ t)
2
/t
2
1.5.5 Slutskys Theorem
Suppose W
n

d
W, U
n
,
p
a and V
n

p
b for rvs U
n
, V
n
, W
n
and suppose h() is a function that is continuous at
a. Then:
U
n
W
n
+V
n

d
aW +b
h(U
n
)
p
h(a)
1.5.6 Mann-Wald Theorem
Suppose W
n

d
W. Let h be continuous a.e. dW then the following is true:
h(W
n
)
d
h(W)
1.5.7 The Univariate Delta Method
Suppose that Z
n

n(T
n
)
d
Z N(0,
2
) and suppose that g is dierentiable at . Then:

n[g(T
n
) g()] =
a
g

() Z
n

d
g

() Z N(0, [g

()]
2

2
)
1.5.8 The Central Limit Theorem
Let X
1
, . . . , X
n
be i.i.d random vectors in R
k
each with mean vector = E(X
i
) and covariance matrix . Then:

n(X
n
)
d
N(0, )
1.5.9 The General Delta Method
Suppose that

n(T
n
)
d
N(0, ) and that h is a function that is dierentiable at . Then:

n(h(T
n
) h())
d
N(0, h()
T
h())
1.6 Moment Generating Functions
The mgf of X is :
M(t) M
X
(t) E[e
tX
], t R
The characteristic function (or chf ) of X is:
(t)
X
(t) E[e
itX
] t R
The Cumulant generating function (or cgf ) of X is:
K(t) K
X
(t) log (t) t R
It is useful to note that:
M
aX+b
(t) = e
tb
M
X
(at)
Similarly,

aX+b
(t) = e
itb
(at)
5
1.7 Independent rvs
If X and Y are independent random variables then:
M
X+Y
(t) = M
X
(t)M
Y
(t)
This follows from the expectation of the product of independent rvs is the product of the expectations.
1.8 Uniqueness Theorem

X
() completely characterizes P
X
() and conversely. Likewise for M
X
() provided it is nite in a neighbourhood
of 0.
1.9 Cramer-Levy Continuity Theorem
Z
n

d
Z if and only if
n
(Z
n
) (Z) for all t.
Furthermore,
If M
Zn
(t) M
Z
(t) for all t in a neighbourhood of 0,then Z
n

d
Z.
1.10 Moment Generating
When term by term integration of M
X
(t) is justied, then by the taylor series expansion of e
tX
the mgf of X
satises:
M
X
(t) = 1 +tE[X] + +
t
k
k!
E[X
k
] +. . .
Giving
d
k
dt
k
M
X
(t)[
t=0
= E[X
k
]
1.11 The Central Limit Theorem Proof
If X
i
(0,
2
) iid, 1 : n, Then M

(0) =
2
Let
Z
n

n(

X
n
)/
Then
M
Zn
(t) =
n

i=1
E
_
e
t

n
(Xi)
_
=
_
M
_
t

n
__
n
=
_
1 +
t
2
2n
2
M

_
t

n
__
n
for t

between 0 and t (Taylors Theorem)


=
_
1 +
1
n
_
t
2
2
1

2
M

_
t

n
___
n
=
_
1 +
1
n
a
n
_
n
for a
n

t
2
2
1

2
M

_
t

n
_

t
2
2

2
=
t
2
2
e
1
2
t
2
= M
N(0,1)
(t)
Z
n

d
N(0, 1) by the Cramer-Levy continuity theorem
6
2 Stat 513
2.1 Properties of Statistics
A statistic T(x) is an unbiased estimator of () if E

[T(x)] = () for all


A statistic is a consistent estimator of () if for every
T
n
(x)
p
()
The bias of a statistic is dened to be:
Bias() ET(x) ()
The Mean Squared Error of an estimator is dened as:
MSE() E

[T(x) ()]
2
= V ar(T(x)) +Bias
2
()
By Markovs inequality we get that
T
n
is consistent whenever MSE 0
2.2 Method of Moments
Equate the population moments with the sample moments and solve for the parameters. In a model with
parameters, we use the rst moment equations.
2.3 One Sample Normal Model
Let x
1
, . . . , X
n
be iid samples from the N(,
2
) distribution. Let G be an orthogonal matrix with the rst row
being (
1

n
, . . . ,
1

n
). Then x MV N(0,
2
I) and so if W Gx N(G,
2
I). Then
W
1
=
1

n
n

i=1
X
i
=

n

X
n
N(

n,
2
)
And since all other
i
are orthogonal to 1 all the remaining W
i
N(0,
2
) The recalling that W
T
W = (Gx)
T
(Gx) =
x
T
x we see:
SS
XX

n

i=1
(X
i


X
n
)
2
=
n

i=1
X
2
i
n

X
2
n
= x
T
x n

X
2
n
= W
T
W W
2
1
=
n

i=2
W
2
i

2

2
n1
Summarizing:
Z
n

n(

X
n
)/ N(0, 1)
S
2
n

2
=
1

2

SS
XX
n 1

1
n 1

2
n1

X
n
and S
2
n
are independent
T
n

n(

X
n
)
S
n
=

n(

X
n
)
_
SS
XX
/(n 1)
T
n1
2.3.1 Condence Intervals
If t

is the upper /2 quantile of the T


n1
distribution then the following are (1 ) condence intervals.
[t

T
n
t

] =
_

X
n
t

S
n

n


X
n
+t

S
n

n
_
7
2.4 Two Sample Normal Model
2.4.1 Equal Variance Model
Assume X
1
, . . . X
m
and Y
1
, . . . Y
n
are independent samples from N(,
2
) and N(,
2
)respectively. Let
be the parameter of interest. Then



X
m


Y
n
N
_
,
2
_
1
m
+
1
n
__
= N
_
,
m+n
mn

2
_
Then we use the following estimator of
2
:
S
2

m1
m+n 2
S
2
X
+
n 1
m+n 2
S
2
Y
=
1
m+n + 2
(SS
XX
+SS
Y Y
)

2
m+n 2

2
m+n2
Then
T
_
mn
m+n
(

X
m


Y
n
) ( )
S
=
_
mn
m+n


S
T
m+n2
2.4.2 Matched Pairs T-test
If x and y are matched pairs, then dene D = x y and proceed as in the one sample model with rv D.
2.5 Suciency and UMVUE
2.5.1 Suciency
A sucient statistic is a statistic T(x) such that P
x|T(x)=t
is independent of the parameter .
Fisher-Neyman Factorization Theorem:
A statistic T(x) is sucient if an only if:
f

(x) = g

(t) h
t
(x)
A statistic is ancillary if it does not depend on . Consequently any Ancillary statistic is also independent of any
sucient statistic.
Minimally Sucient A statistic T is minimally sucient if any other sucient statistic

T is necessarily of
the form

T =

h(T) for some function

h.
2.5.2 Finding UMVUE estimators
Theorem: Rao-Blackwell If U is an unbiased estimator of and T is a sucient statistic for then
V E[U[T] = m(T)
Is a function of T, an unbiased estimator of and has the property that:
V ar(V ) V ar(U)
(Proof: V (U) = V (E[U[T]) +E[V (U[T)])
2.5.3 Lehmann-Schee Theorem
Let T be completeand sucient for then let V (T) be any unbiased estimator of () with nite variance. Then
V (T) is the unique UMVUE of ().
2.6 Completeness, ancillarity
A family of distributions T
T
of a rv T is complete if
E
p
h(T) = 0
For all possible distributions p in the family of T
T
then:
h(t) = 0 t
8
2.6.1 Basus Theorem
Suppose that T is sucient and complete for . Then every ancillary statistic V is independent of T.
2.7 Exponential Families
2.7.1 Denition
Suppose the distribution of a random variable can be written in the form:
f

(x) = c() exp


_

i=1
a
i
()T
i
(x)
_
h(x)1
D
(x) = exp
_
b() +

i=1
a
i
()T
i
(X)
_
h(x)
Where c (or b) is the integration constant, D is the common support. Then f is of the exponential family. The
order of the family is the smallest possible for which this format is possible.
2.7.2 Natural Parameters
If T is minimal and has order we can write the distribution in its canonical form
e
d()+

i=1
iTi(x)
h(x)1
D
(x)
Then
i
are called the natural parameters
If we dene A : d() < be the natural parameter space. Then let T
N
P

: A be the full
exponential family.
Here are a few properties:
the natural parameter space A is a convex set and d() is convex on A
If () Eg(x) has nite value for all A then has derivatives of all order at each interior point of A and
can be dierentiated under the expectation sign.
2.7.3 Suciency and Completeness
Let X
1
, . . . , X
n
be iid from an exponential family.
c()e
a
T
()T(x)
h(x)
Let T
i

n
j=1
T
i
(X
j
) Then T (T
1
, . . . , T

)
T
is sucient for T
N
If A

(a
1
(), . . . , a

())
T
:

contains a -dimensional subset 1, then T is minimally sucient and


complete for T
N
T is also of the exponential family with the same a
i
() as each X
i
.
Suppose is xed (let = 2 for example). Then (T
1
[T
t
= t) is again an exponential family.
2.8 Likelihood and the Cramer-Rao Lower Bound
2.8.1 Likelihood and score
The likelihood of is dened to be
L
n
() f

(x)
where f is the pdf (continuous or discrete) of the rv x. This is viewed as a function of instead of x.
The log-likelihood will be noted as:

n
() log L
n
()
9
And so the score function is dened as:

n
()

()
We will call:
0 =

n
()
The Likelhiood Equation and solutions

that solve this equation are called Maximum Likelihood Estimators
2.8.2 Fisher Information
The Fisher Information is dened as being:
J
n
() E

()]
2

Further,

J
n
() E

n
()
2.8.3 Cramer-Rao
If 1 = E

1 can be dierentiated under the integral sign at once then:


E

()] = 0
Proof.
0 =
d
d
1 =
d
d
E

1 =
d
d
_

_
f

(x)dx =
_

_

(x)dx
=
_

_

(x)
f

(x)
f

(x)dx =
_

_ _

log f

(x)
_
f

(x)dx
= E

n
(x)]
If 1 = E

1 can be dierentiated twice under the integral sign then:


J
n
() =

J
n
()
If
() E

[T(x)] where

() exists in a neighbourhood of
() = E

[T(x)] can be dierentiated once under the integral sign


1 = E

1 can be dierentiated once under the integral sign


Then:
V ar

[T(x)]
[

()]
2
J
n
()
Thus, there is a standard that no unbiased estimator can surpass.
Corollary: Equality is achieved for a parameter () only if:

() = A
n
()[T(x) ()]
In which case we can re-write the CR-lower bound as:
V ar

[T(X)] =

()
A
n
()

Moreover, J
n
() = [

()A
n
()[ in this case.
10
2.8.4 Eciency
If T and

T are estimators of () then the relative eciency of

T with respect to T is dened as:
c(

T, T)
MSE

[T]
MSE

T]
If CRB
n
() is the Cramer-Rao lower bound. Then the absolute eciency of

T is dened by:
c

T, CRB)
CRB
n
()
MSE

T]
2.9 Maximum Likelihood Estimators
2.9.1 Denition and Basic Properties
The value of that maximizes the likelihood is called a maximum likelihood estimator.
Suciency: If T is sucient for , then any MLE of is a function of T.
Invariance If () is an in injective function and

is the MLE for then (

) is the MLE for ().


2.9.2 Asymptotic Properties in the iid Case
Suppose X
1
, . . . X
n
are iid from a P

regular distribution with common support. That, the score function has
mean vector 0 and covariance matrix J() =

J() > 0.
First let Y
k

log f

(X
k
) (0, J()) (since the

rv has 0 expectation under regularity). Then:
Z
n

n(

Y
n

Y
)
d
Z() N(0, J())
By the ordinary CLT.
Dene the information rv to be
I
n
()
1
n
n

i=1

(, X
k
)
p

J() = J()
And the observed information to be

I
n
() I
n
(

)
Then we have:

n(

) =
a
J
1
()Z
n
()

d
J
1
()Z() J
1
()N(0, J())
= N(0, J
1
())
Further,

I
n
I
n
(

)
p

J() = J()
Finally,

I
1/2
n

n(

)
d
N(0, I)
2.9.3 Proof of Asymptotic Normality in dim() = 1
If the MLE

solves the Likelihood Equation for some regular model, then:
0 =
1

) =
1

()
_

1
n

)
_

n(

) by the Mean Value Theorem


= Z
n
() I
n
(

n(

n(

) = I
1
n
(

) Z
n
()
11
Z
n
()
d
N(0, J()) is already established, and since


p
and and

is in between

and we have that


p

so I
n
(

)
p
J() Then combining this with the above and applying Slutskys Theorem:

n(

) = I
1
n
(

) Z
n
() =
a
I
1
n
() Z
n
()
d
J
1
Z()
Giving the result. Another application of Slutsky gives the nal.
2.10 Location and Scale Models
2.10.1 The location and scale model equation
If W
1
, . . . , W
n
are independent with xed known df F
0
() then consider rvs of the form:
Y
k
+W
k
2.10.2 Natural Estimators
Natural Estimators of Location

Y = +

W

Y medianY
k
= +

W
1
n
(Y
n:1
+Y
n:n
) = +
1
2
(W
n:1
+W
n:n
)
Natural Estimators of Scale

D
n

1
n
n

k=1
[Y
k


Y [ =
1
n
n

i=1
[W
k


W
k
[
S
n
(Y ) = S
n
(W)
Y
n:n
Y
n:1
= (W
n:n
W
n:1
)
2.10.3 Invariance Conditions
Suppose C(y) and V (y) satisfy:
C(y) = +C(w) is location invariant
V (y) = V (w) is scale invariant
Then (in analogy to T-statistics):
T

n
C(y)
V (y)
=

n
+ C(w)
V (w)
=

n
C(w)
V (w)
= T(w)
Is a pivot for in that its distribution does not depend on or and can be used in creating condence intervals for .
Likewise V (y)/ is a pivot for and has distribution independent of and .
2.11 Bayesian Statistics
When we view itself as a random variable we may assume that it has some distribution () called a prior
distribution. When we observe rv x with distribution f([) then we are interested in the posterior distribution
dened as:
([x) =
f(x[)()
f
x
(x)
f(x[)()
Suciency extends into this context in the sense that a statistic T is Bayes Sucient if the posterior distribution
depends on x only through the value t = T(x). If T is sucient for f([) then it is Bayes Sucient.
Conjugate Prior: A conjugate prior is a prior of the form such that given the likelihood, the posterior is in
the same family as the prior.
12
2.12 Neyman-Pearson and UMP Tests
2.12.1 The Set up
The Null Hypothesis is a statement that the true unknown is in some proper subset H
0
:
0

Then we seek a decision rule of the form:
RejectH
0
if x C
For some set C.
If
0
is a single value (labelled
0
) then the null hypothesis H
0
: =
0
is called a simple hypothesis; else it
is a composite hypothesis.
The power function is dened by:
() P

(x C)
A randomized test is determined by a test function or critical function
:
n
[0, 1]
Dened by:
() E

P(H
0
is rejected[x) = E

(x)
a non-randomized test has (x) = 1
c
(x).
2.12.2 The Neyman-Pearson Lemma
Suppose either H
0
: x P
0
or H
a
: X P
a
with distributions f
0
and f
a
respectively. Then there exists a constant
k and a function (x) for which the test
(x) =
_

_
1 if (x) > k
(x) if (x) = k
0 if (x) < k
has E
0
(x) = , i.e., size . Where (x)
fa(x)
f0(x)
. (x) represents randomizing on the boundary for discrete
random variables to ensure size .
Any test of this type is a most powerful test of with level .
Corollary 1: if (x) = g(t) for some function g() of t T(x), then the test can be re-written as:
(x) =
_

_
1 if t > c
if t = c
0 if t < c
Where P
0
(T > c) +P
0
(T = c) = .
2.12.3 UMP tests
Fix a particular least favourable value
0

0
and dene a N-P test
0
() (which gives a most powerful test of
level ) suppose also that it is also the most powerful test versus any alternative
a

C
0
and similarly is a level
test for all
0
(that is sup
0

0
() . Then:

0
() is a Uniformly Most Powerful (UMP) test of H
0
:
0
vs H
a
:
C
0
13
2.12.4 Suciency and MLR
Suppose T T(X) is a sucient for then the N-P Likelihood ratio can be written as a function of t. So the
test may be based directly on any sucient statistic.
Monotone Likelihood Ratio
Let T(x) and be real-valued. Suppose that for every <

(x)/f

(x) is an () function of t = T(x)


Then the family of distributions f

: is said to have monotone likelihood ratio.


Karlin-Rubin Theorem
Suppose a family of distributions has MLR in T. Fix a value 0 1, x
0
, then the corollary to
the N-P Lemma is UMP for testing H
0
:
0
vs H
a
: >
0
Exponential Families and MLR
Suppose x is from an exponential family with distribution:
c()e
a()T(x)
h(x)
With parameter a() that is () in , then the family of distributions of T has MLR if T.
2.13 Likelihood Ratio Test
2.13.1 The Test
Suppose dim() = r and we express a test:
H
0
:
q+1
= =
r
= 0
And we wish to test against H
0
:
c
0
Likelihood ratio test
Let

n

sup

L(, x)
sup
0
L(, x)
=
L(

)
L(

0
)
Where

is the MLE of and

0
is the MLE for
0
Then we reject when this statistic is too big. We can do
this either by nding the distribution of this statistic (or a function of it) that we can calculate, or alternatively,
by appealing to the following asymptotic result:
2 log
n
(x)
d

2
rq
when H
0
is true
Useful math result
log(1 +z) =
a
z z
2
/2
Proof.
2 log
n
= 2[(

) (
0
)] = 2
_
(

)
_
(

) + (
0

)
T

(

) +
n
2
(
0

)
T
1
n

)(
0

)
__
= n(


0
)
T
1
n

)(


0
) = (


0
)
T

I(

)(


0
) =
a
(


0
)
T
J(
0
)(


0
)

d
m

i=1
N(0, 1)
2
=
2
m
14
2.13.2 Normal Theory Models
Let r
0
r 1 and q
0
= q 1 then for any normal theory linear model the LR test reduces to:
n r
0
r
0
q
0

2/n
n
1 =
n r
0
r
0
q
0
_

2
0

2

2
_
F
r0q0,nro
2.14 Chi-Squared Tests
2.14.1 Test of t for Multinomial observations
Let N
1
, . . . , N

be Multinomial(n; p
1
, . . . , p

). Let P
01
, . . . , P
0
be the hypothesized null probabilities. Then dene
the following statistic:

2
n

i=1
_
Observation
i
Expectation
i
_
Expectation
i
_
2

i=1
_
N
i
np
oi

np
0i
_
2

i=1
D
2
i
Then

2
n

d

2
1
when H
0
: p
1
= p
01
, . . . , p

= p
0
Proof. Let A
i
be the partition of such that p
i
is the probability of an observation being in A
i
. The dene N
ji
be
1 or 0 according to X
j
in A
i
or not. Then N
i
=

j=1
N
ji
. Then let:
U
j

_
N
j1
p
01

p
01
, . . . ,
N
j1
p
01

p
01
_
Are iid random vectors with mean 0 and covariance:
= I

p
0

p
0
T
That is,
ii
= 1 p
0i
and
ij
=

p
01
p
0j
if i ,= j. Then we apply the Multivariate CLT to get:
Y
n

1

n
n

j=1
U
j

d
Y N(0, )
Then

2
n
= Y
T
n
Y
n
= g(Y
n
)
d
g(Y) = Y
T
Y
2
1
By letting

=

p
01
and
1
, . . . ,
1
be orthogonal vectors to

then let G
T
= (
1
, . . . ,

) and consider
W GY . Then W
1
, . . . , W
1
are iid N(0, 1) and W

= 0 almost everywhere with W


T
W = Y
T
Y
2.14.2 Test of t for Independence
Suppose the sample space is divided into both an A
i
partition and a B
j
partition. Suppose that N
ij
is the count
of outcomes in A
i
B
j
. Let N
i.
be the number of total A
i
outcomes and N.j be the total B
j
outcomes. Say we want
to test whether the two types of events are independent. In which case both:
p
ij
=
N
ij
n
and p
i.
p
.j
=
N
i.
n

N
.j
n
Would beoth be unbiased estimators of p
ij
and under the null hypothesis they should be the same. Then similarly:

2
n

i=1
_
Observation
i
Expectation
i
_
Expectation
i
_
2

i=1
_
N
ij
n p
i.
p
.j
_
n p
i.
p
.j
_
2

i=1
D
2
i
Has:

2
n

d

2
(I1)(J1)
When they are independent.
15
2.14.3 Degrees of Freedom Rule
The basic degrees of freedome rule is this: When H
0
is true,
df = number of cells number of parameters estimated number of cell count restrictions
For example:
Chisquare Test of Fit: there are cells, no parameters are estimated under the null hypothesis. There is
only one restriction: N
1
+ +N

= n.
df = 0 1 = 1
Chisquare test for independent: There are IJ cells, When H
0
is true we estimate (I 1) parameters for the
rst rv and J 1 parameters for the second for a total of I 1 +J 1 parameters and again we have the same cell
count restriction so:
df = IJ (I 1 +J 1) 1 = IJ I J + 1 = (I 1)(J 1)
3 Distributions
3.1 Discrete
3.1.1 Bernoulli
If X is 1 with probability p then:
Property Value
PDF p
x
(1 p)
1x
CDF:
_

_
0 if k < 0
1 p if 0 k < 1
1 if k 1
Mean p
Variance p(1 p)
mgf (1 p) +pe
t
3.1.2 Binomial
Assume you repeat a Bernoulli trial n times with probability of success k. Let X be the number of successes. Then
X has binomial distribution.
Property Value
PDF
_
n
k
_
p
k
(1 p)
nk
Mean np
Variance np(1 p)
mgf (1 p)
p
e
t

n
3.1.3 Geometric
Consider repeating a Bernoulli trial with probability of success p until observing a success. Then we can either
count the number of total turns it takes (including the nal success) or the number of failures. Of course these only
dier in that Turns = Failures + 1. The following summarizes the Geometric Turns distribution.
Property Value
PDF (1 p)
k1
p
CDF: 1 (1 p)
k
Mean
1
p
Variance
1p
p
2
mgf
pe
t
1(1p)e
t
16
3.1.4 Negative Binomial
Consider repeating a Bernoulli trial with probability of success p until r successes (similar to Geometric). And then
counting the number of turns (including the last). (We could also count the number of failures instead of turns).
The following summarizers the Negative Binomial Turns distribution.
Property Value
PDF
_
k1
r1
_
p
r
(1 p)
kr
Mean
r
p
Variance
r(1p)
p
2
mgf
_
pe
t
1(1p)e
t
_
r
3.1.5 Poisson
Property Value
PDF

k
e

k!
Mean
Variance
mgf e
(e
t
1)
3.1.6 Hyper-Geometric
Suppose there are two urns one with K balls and one with NK balls in it. Then let X denote the number of balls
selected from the urn with K balls without replacement in n trials. Then the following summarizes its distribution.
Property Value
PDF
(
K
k
)(
NK
nk
)
(
N
n
)
Mean n
K
N
Variance n
K
N
(NK)
N
_
Nn
N1
_
3.1.7 Discrete Uniform
Consider picking nn numbers uniformly from a, a + 1, . . . , b
Property Value
PDF
1
n
Mean
a+b
n
Variance
n
2
1
12
mgf
e
at
e
(b+1)t
n(1e
t
)
17
3.2 Continuous
3.2.1 Normal
Property Value
PDF
1

2
2
exp
_

(x)
2
2
2
_
Mean
Variance
2
mgf exp
_
t +
1
2

2
t
2
_
3.2.2 Multivariate Normal
Let Z
1
, . . . , Z
n
be independent N(0, 1) random variables. Then dene:
Y = AZ +
For a matrix of constants A and vector of constants . Then Y is Multivariate Normal with mean vector and
covariance matrix
Y
= AA
T
. Then the following denes its distribution (assuming
y
has full rank)
Property Value
PDF
1
(2)
n/2
1
||
1/2
exp
_
1
2
(Y )
T

1
(Y )
_
Mean
Variance
mgf exp
_

T
t +
1
2
t
T
t
_
Further, any linear combination of Multivariate Normals, is multivariate normal. All Marginals are multivariate nor-
mal with the respective subset of the mean vector and covariance matrix. Finally, conditionals are also multivariate
normal. Consider splitting Y into to parts Y
1
and Y
2
. With:
_
Y
1
Y
2
_
A
__

2
_
,
_

11

12

21

22
__
Then
Y
1
[Y
2
= y
2
A(
1
+
12

1
22
(y
2

2
),
11

12

1
22

21
)
3.2.3 Continuous Uniform
Pick a point randomly on the interval (a, b)
Property Value
PDF
1
ba
CDF:
xa
ba
Mean
1
2
(a +b)
Variance
1
2
(b a)
2
mgf
e
tb
e
ta
t(ba)
3.2.4 Exponential
Property Value
PDF e
x
CDF: 1 e
x
Mean
1

Variance
1

2
Median
ln 2

mgf
_
1
t

_
1
18
3.2.5 Gamma
The Gamma Distribution can be thought of as a generalization of the sum of r independent exponential distributions
with the same rate .
Property Value
PDF

r
(r)
x
r1
e
x
Mean
r

Variance
r

2
mgf
_
1
t

_
r
3.2.6 Beta
Property Value
PDF
(+)
()()
x
1
(1 x)
1
Mean

+
Variance

(+)
2
(++1)
Relationship to other Distributions: Beta(1, 1) is the Unif(0, 1) distribution.
If U
2
m
and V
2
n
then U/(U +V ) Beta(m/2, n/2).
3.2.7 Cauchy(0, 1)
Property Value
PDF
1
(1+x
2
)
CDF:
1

arctan(x) +
1
2
Mean undened
Variance undened
chf e
it|t|
3.2.8 Chi-square
The
2
n
distribution can be thought of as the sum of n squared independent N(0, 1) random variables. Alternatively
it is the special case of Gamma
_
n
2
,
1
2
_
Property Value
PDF
1
2
n/2
(n/2)
x
n/21
e

x
2
Mean n
Variance 2n
mgf (1 2t)
n/2
3.2.9 T
n
-distribution
If Z N(0, 1) and V
2
n
be independent random variables. Then
T
Z
_
V/n
Has the following T
n
distribution.
Property Value
PDF
(
n+1
2
)

n(n/2)

1+
t
2
n

(n+1)/2
Mean 0 if n > 1 otherwise undened
Variance
n
n2
for n > 2 otherwise undened
19
3.2.10 T
m,n
-distribution
If U
2
m
and V
2
n
then
F
1
m
U
1
n
V
Has the T
m,n
distribution as follows.
Property Value
PDF
(
m+n
2
)
(
m
2
)(
n
2
)
(m/n)
m/2
f
(m2)/2
(1+(m/n)f)
(m+n)/2
Mean
n
n2
Variance
2n
2
(m+n2)
m(n2)
2
(n4)
20

You might also like