# School of Mathematical Sciences

**MTH736U Mathematical Statistics Solutions to Exercises
**

Exercise 1.1 For any events A, B, C deﬁned on a sample space Ω we have

Commutativity

A∪ B = B ∪ A

A∩ B = B ∩ A

Associativity

A∪ (B ∪ C) = (A∪ B) ∪ C

A∩ (B ∩ C) = (A∩ B) ∩ C

Distributive Laws

A∩ (B ∪ C) = (A∩ B) ∪ (A∩ C)

A∪ (B ∩ C) = (A∪ B) ∩ (A∪ C)

DeMorgan’s Laws

(A∪ B)

c

= A

c

∩ B

c

(A∩ B)

c

= A

c

∪ B

c

**Exercise 1.2 Here we will use general versions of DeMorgan’s Laws: For any events A
**

1

, A

2

, . . . ∈ A,

where A is a σ-algebra deﬁned on a sample space Ω we have:

_

∞

_

i=1

A

i

_

c

=

∞

i=1

A

c

i

and

_

∞

i=1

A

i

_

c

=

∞

_

i=1

A

c

i

Now, by Deﬁnition 1.5 if A

1

, A

2

, . . . ∈ A, then also A

c

1

, A

c

2

, . . . ∈ A. Therefore,

∞

_

i=1

A

c

i

∈ A.

Then, by the ﬁrst general DeMorgan Law we have

∞

_

i=1

A

c

i

=

_

∞

i=1

A

i

_

c

∈ A

and so

∞

i=1

A

i

∈ A as well. It means that σ-algebra A is closed under intersection of its elements.

1

Exercise 1.3 Theorem 1.2

If P is a probability function, then

(a) P(A) =

∞

i=1

P(A∩ C

i

) for any partition C

1

, C

2

, . . ..

(b) P(

∞

i=1

A

i

) ≤

∞

i=1

P(A

i

) for any events A

1

, A

2

, . . .. [Boole’s Inequality]

Proof

(a) Since C

1

, C

2

, . . . form a partition of Ω we have that C

i

∩ C

j

= ∅ for all i = j and Ω =

∞

i=1

C

i

.

Hence, by the Distributive Law, we can write

A = A∩ Ω = A∩

_

∞

_

i=1

C

i

_

=

∞

_

i=1

(A∩ C

i

).

Then, since A∩ C

i

are disjoint,

P(A) = P

_

∞

_

i=1

(A∩ C

i

)

_

=

∞

i=1

P(A∩ C

i

).

(b) First we construct a disjoint collection of events A

1

, A

2

, . . . such that

∞

_

i=1

A

i

=

∞

_

i=1

A

i

.

Deﬁne

A

1

= A

1

A

i

= A

i

\

_

i−1

_

j=1

A

j

_

, i = 2, 3, . . .

Then

P

_

∞

_

j=1

A

i

_

= P

_

∞

_

j=1

A

i

_

=

∞

j=1

P(A

i

)

since A

i

are disjoint. Now, by construction A

i

⊂ A

i

for all i = 1, 2, . . .. Hence, P(A

i

) ≤

P(A

i

) for all i = 1, 2, . . ., and so

∞

i=1

P(A

i

) ≤

∞

i=1

P(A

i

).

**Exercise 1.4 Let X ∼ Bin(8, 0.4), that is n = 8 and the probability of success p = 0.4. The pmf, shown
**

in a mathematical, tabular and graphical way and a graph of the c.d.f. of the variable X follow.

Mathematical form:

_

(x, P(X = x) =

8

C

x

(0.4)

x

(0.6)

8−x

), x ∈ X = {0, 1, 2, . . . , 8}

_

Tabular form:

x 0 1 2 3 4 5 6 7 8

P(X = x) 0.0168 0.0896 0.2090 0.2787 0.2322 0.1239 0.0413 0.0079 0.0007

P(X ≤ x) 0.0168 0.1064 0.3154 0.5941 0.8263 0.9502 0.9915 0.9993 1

2

0 2 4 6 8

x

0.00065536

0.07016448

0.13967360

0.20918272

0.27869184

0 1 2 3 4 5 6 7 8

x

0.0

0.2

0.4

0.6

0.8

1.0

Figure 1: Graphical representation of the mass function and the cumulative distribution function for X ∼

Bin(8, 0.4)

Exercise 1.5 1. To verify that F(x) is a cdf we will check the conditions of Theorem 1.4.

(i) F

(x) =

2

x

3

> 0 for x ∈ X = (1, ∞). Hence, F(x) is increasing on (1, ∞). The function is

equal to zero otherwise.

(ii) Obviously lim

x→−∞

F(x) = 0

(iii) lim

x→∞

F(x) = 1 as lim

x→∞

1

x

2

= 0

(iv) F(x) is continuous.

2. The pdf is

f(x) =

_

_

_

0, for x ∈ (−∞, 1);

2

x

3

, for x ∈ (1, ∞);

not deﬁned, for x = 1.

3. P(3 ≤ X ≤ 4) = F(4) −F(3) =

25

144

.

**Exercise 1.6 Discrete distributions:
**

1. Uniform U(n) (equal mass at each outcome): The support set and the pmf are, respectively,

X = {x

1

, x

2

, . . . , x

n

} and

P(X = x

i

) =

1

n

, x

i

∈ X,

where n is a positive integer. In the special case of X = {1, 2, . . . , n} we have

E(X) =

n + 1

2

, var(X) =

(n + 1)(n −1)

12

.

Examples:

(a) X ≡ ﬁrst digit in a randomly selected sequence of length n of 5 digits;

(b) X ≡ randomly selected student in a class of 15 students.

2. Bern(p) (only two possible outcomes, usually called “success” and “failure”): The support set

and the pmf are, respectively, X = {0, 1} and

P(X = x) = p

x

(1 −p)

1−x

, x ∈ X,

where p ∈ [0, 1] is the probability of success.

E(X) = p, var(X) = p(1 −p).

Examples:

(a) X ≡ an outcome of tossing a coin;

3

(b) X ≡ detection of a fault in a tested semiconductor chip;

(c) X ≡ guessed answer in a multiple choice question.

3. Bin(n, p) (number of success in n independent trials): The support set and the pmf are, respec-

tively, X = {0, 1, 2, . . . n} and

P(X = x) =

_

n

x

_

p

x

(1 −p)

n−x

, x ∈ X,

where p ∈ [0, 1] is the probability of success.

E(X) = np, var(X) = np(1 −p).

Examples:

(a) X ≡ number of heads in several tosses of a coin;

(b) X ≡ number of semiconductor chips in several faulty chips in which a test ﬁnds a defect;

(c) X ≡ number of correctly guessed answers in a multiple choice test of n questions.

4. Geom(p) (the number of independent Bernoulli trials until ﬁrst “success”): The support set and

the pmf are, respectively, X = {1, 2, . . .} and

P(X = x) = p(1 −p)

x−1

= pq

x−1

, x ∈ X,

where p ∈ [0, 1] is the probability of success, q = 1 −p.

E(X) =

1

p

, var(X) =

1 −p

p

2

.

Examples include:

(a) X ≡ number of bits transmitted until the ﬁrst error;

(b) X ≡ number of analyzed samples of air before a rare molecule is detected.

5. Hypergeometric(n, M, N) (the number of outcomes of one kind in a random sample of size n

taken from a dichotomous sample space with M and N elements in each group): The support

set and the pmf are, respectively, X = {0, 1, . . . , n} and

P(X = x) =

_

M

x

__

N −M

n −x

_

_

N

n

_ , x ∈ X,

where M ≥ x, N −M ≥ n −x.

E(X) =

nM

N

, var(X) =

nM

N

×

(N −M)(N −n)

N(N −1)

.

6. Poisson(λ) (a number of outcomes in a period of time or in a part of a space): The support set

and the pmf are, respectively, X = {0, 1, . . .} and

P(X = x) =

λ

x

x!

e

−λ

,

where λ > 0.

E(X) = λ, var(X) = λ.

Examples:

(a) count blood cells within a square of a haemocytometer slide;

(b) number of caterpillars on a leaf;

(c) number of plants of a rear variety in a square meter of a meadow;

(d) number of tree seedlings in a square meter around a big tree;

4

(e) number of phone calls to a computer service within a minute.

**Exercise 1.7 Continuous distributions:
**

1. Uniform, U(a, b): The support set and the pdf are, respectively,

X = [a, b] and

f

X

(x) =

1

b −a

I

[a,b]

(x).

E(X) =

a +b

2

, var(X) =

(b −a)

2

12

.

2. Exp(λ): The support set and the pdf are, respectively,

X = [0, ∞) and

f

X

(x) = λe

−λx

I

[0,∞)

(x),

where λ > 0.

E(X) =

1

λ

, var(X) =

1

λ

2

.

Examples:

• X ≡ the time between arrivals of e-mails on your computer;

• X ≡ the distance between major cracks in a motorway;

• X ≡ the life length of car voltage regulators.

3. Gamma(α, λ): An exponential rv describes the length of time or space between counts. The

length until r counts occur is a generalization of such a process and the respective rv is called

the Erlang random variable. Its pdf is given by

f

X

(x) =

x

r−1

λ

r

e

−λx

(r −1)!

I

[0,∞)

(x), r = 1, 2, . . .

The Erlang distribution is a special case of the Gamma distribution in which the parameter r is

any positive number, usually denoted by α. In the gamma distribution we use the generalization

of the factorial represented by the gamma function, as follows.

Γ(α) =

_

∞

0

x

α−1

e

−x

dx, for α > 0.

A recursive relationship that may be easily shown integrating the above equation by parts is

Γ(α) = (α −1)Γ(α −1).

If α is a positive integer, then

Γ(α) = (α −1)!

since Γ(1) = 1.

The pdf is

f

X

(x) =

x

α−1

λ

α

e

−λx

Γ(α)

I

[0,∞)

(x),

where the α > 0, λ > 0 and Γ(·) is the gamma function. The mean and the variance of the

gamma rv are

E(X) =

α

λ

, var(X) =

α

λ

2

.

4. χ

2

(ν): A rv X has a Chi-square distribution with ν degrees of freedom iff X is gamma dis-

tributed with parameters α =

ν

2

and λ =

1

2

. This distribution is used extensively in interval

estimation and hypothesis testing. Its values are tabulated.

5

**Exercise 1.8 We can write
**

E(X −b)

2

= E

_

X −EX + EX −b

_

2

= E

_

(X −EX) + (EX −b)

_

2

= E(X −EX)

2

+ (EX −b)

2

+ 2 E

_

(X −EX)(EX −b)

_

= E(X −EX)

2

+ (EX −b)

2

+ 2(EX −b) E(X −EX)

= E(X −EX)

2

+ (EX −b)

2

as E(X −EX) = 0.

This is minimized when (EX −b)

2

= 0, that is, EX = b.

Exercise 1.9

var(aX +b) = E(aX +b)

2

−{E(aX +b)}

2

= E(a

2

X

2

+ 2abX +b

2

) −(a

2

{EX}

2

+ 2ab EX +b

2

)

= a

2

EX

2

−a

2

{EX}

2

= a

2

var(X)

**Exercise 1.10 Let X ∼ Bin(n, p). The pmf of X is
**

P(X = x) =

_

n

x

_

p

x

(1 −p)

n−x

, x = 0, 1, . . . , n.

To obtain the mgf of X we will use the Bionomial formula

n

x=0

_

n

x

_

u

x

v

n−x

= (u +v)

n

.

Hence, we may write

M

X

(t) = E

_

e

tX

_

=

n

x=0

e

tx

_

n

x

_

p

x

(1 −p)

n−x

=

n

x=0

_

n

x

_

_

e

t

p

_

x

(1 −p)

n−x

=

_

e

t

p + (1 −p)

_

n

.

Now, assume that X ∼ Poisson(λ). Then

P(X = x) =

e

−λ

λ

x

x!

, x = 0, 1, 2, . . .

Here we will use the Taylor series expansion of e

u

at zero, i.e.,

e

u

=

∞

x=0

u

x

x!

.

Hence, we have

M

X

(t) = E

_

e

tX

_

=

∞

x=0

e

tx

e

−λ

λ

x

x!

= e

−λ

∞

x=0

_

λe

t

_

x

x!

= e

−λ

e

λe

t

= e

λ(e

t

−1)

.

6

Exercise 1.11 The pdf of a standard normal rv Z is

f

Z

(z) =

1

√

2π

e

−z

2

/2

I

R

.

To ﬁnd the distribution of a transformed rv Y = g(Z) = µ + σZ we will apply Theorem 1.11. The

inverse mapping at y is z = g

−1

(y) =

y−µ

σ

. Assuming σ > 0 the function g(·) is increasing. Then,

we can write

f

Y

(y) = f

Z

_

g

−1

(y)

_

d

dy

g

−1

(y)

=

1

√

2π

e

−

(y−µ)

2

2σ

2

1

σ

=

1

σ

√

2π

e

−

(y−µ)

2

2σ

2

.

This is a pdf of a normal distribution with parameters µ ∈ R and σ

2

∈ R

+

, i.e., Y ∼ N(µ, σ

2

).

Exercise 1.12

(a) Let M

X

(t) be a mgf of a random variable X. Then, for Y = a+bX, where a and b are constants,

we can write

M

Y

(t) = E

_

e

tY

_

= E

_

e

t(a+bX)

_

= E

_

e

ta

e

tbX

_

= e

ta

E

_

e

tbX

_

= e

ta

M

X

(tb).

(b) First we derive the mgf for a standard normal rv Z.

M

Z

(t) = E

_

e

tZ

_

=

_

∞

−∞

e

tz

1

√

2π

e

−z

2

/2

dz

=

_

∞

−∞

1

√

2π

e

−

1

2

(z

2

−2tz)

dz

=

_

∞

−∞

1

√

2π

e

−

1

2

(z

2

−2tz+t

2

)+

1

2

t

2

dz

=

_

∞

−∞

1

√

2π

e

−

(z−t)

2

2

e

t

2

2

dz

= e

t

2

2

_

∞

−∞

1

√

2π

e

−

(z−t)

2

2

dz

= e

t

2

2

,

as the integrand is the pdf of a rv with mean t and variance 1. Now, we can calculate the mgf of

Y ∼ N(µ, σ

2

) using the result of part (a), that is,

M

Y

(t) = E(e

t(µ+σZ)

) = e

tµ

M

Z

(tσ) = e

tµ

e

t

2

σ

2

2

= e

(2tµ+t

2

σ

2

)/2

.

Exercise 1.13 Let us write the table of counts given for the smoking status and gender in terms of propor-

tions, as follow:

S

1 2 3

G 0 0.20 0.32 0.08 0.60

1 0.10 0.05 0.25 0.40

0.30 0.37 0.33 1.00

7

where S is a random variable denoting ’smoking status’ such that S(s) = 1, S(q) = 2, S(n) = 3

and G is a random variable denoting ’gender’ such that G(m) = 0, G(f) = 1. In the table we have

the distribution of X = (G, S) as well as marginal distributions of G and of S.

1. p

G

(0) = 0.6.

2. p

X

(0, 1) = 0.2.

3. p

S

(1) +p

S

(2) = 0.30 + 0.37 = 0.67.

4. p

X

(1, 1) +p

X

(0, 2) = 0.10 + 0.05 = 0.15.

**Exercise 1.14 Here we have conditional probabilities.
**

1. p

S

(S = 1|G = 0) =

0.20

0.60

=

1

3

.

2. p

G

(G = 0|S = 1) =

0.10

0.30

=

1

3

.

**Exercise 1.15 The ﬁrst equality is a special case of Lemma 1.2 with g(Y ) = Y .
**

To show the second equality we will work on the RHS of the equation.

var(Y |X) = E(Y

2

|X) −[E(Y |X)]

2

.

So

E[var(Y |X)] = E[E(Y

2

|X)] −E{[E(Y |X)]

2

} = E(Y

2

) −E{[E(Y |X)]

2

}.

Also,

var(E[Y |X]) = E{[E(Y |X)]

2

} −{E[E(Y |X)]}

2

= E{[E(Y |X)]

2

} −[E(Y )]

2

.

Hence, adding these two expressions we obtain

E[var(Y |X)] + var(E[Y |X]) = E(Y

2

) −[E(Y )]

2

= var(Y ).

**Exercise 1.16 (See example 1.25)
**

Denote by a success a message which gets into the server and by X the number of successes in Y

trials. Then

X|Y ∼ Bin(Y, p), p =

1

3

Y ∼ Poisson(λ) λ = 21

As in Example 1.25

X ∼ Poisson(λp) = 7.

Hence

E(X) = λp, var(X) = λp = 7.

**Exercise 1.17 Here we have a two-dimensional rv X = (X
**

1

, X

2

) which denotes length of life of two

components, with joint pdf equal to

f

X

(x

1

, x

2

) =

1

8

x

1

e

−

1

2

(x

1

+x

2

)

I

{(x

1

,x

2

):x

1

>0,x

2

>0}

.

8

1. Probability that the length of life of each of the two components will be greater than 100 hours

is:

P

X

(X

1

> 1, X

2

> 1) =

_

∞

1

_

∞

1

1

8

x

1

e

−

1

2

(x

1

+x

2

)

dx

1

dx

2

=

1

8

_

∞

1

e

−

1

2

x

2

_

_

_

∞

1

x

1

e

−

1

2

x

1

dx

1

. ¸¸ .

_

_

=6e

−1/2

(by parts)

dx

2

=

1

8

_

∞

1

e

−

1

2

x

2

_

6e

−

1

2

_

dx

2

=

3

2

e

−1

.

That is, the probability that the length of life of each of the two components is greater than 100

hours is

3

2

e

−1

.

2. Now we are interested in component II only, so we need to calculate the marginal pdf of the

length of its life.

f

X

2

(x

2

) =

_

∞

0

1

8

x

1

e

−

1

2

(x

1

+x

2

)

dx

1

=

1

8

e

−

1

2

x

2

_

_

_

∞

0

x

1

e

−

1

2

x

1

dx

1

. ¸¸ .

_

_

=4 (by parts)

=

1

2

e

−

1

2

x

2

.

Now,

P

X

2

(X

2

> 2) =

_

∞

2

1

2

e

−

1

2

x

2

dx

2

= e

−1

.

That is, the probability that the length of life of component II will be bigger than 200 hours is

e

−1

.

3. We can write

f

X

(x

1

, x

2

) =

_

1

8

x

1

e

−

1

2

x

1

_

_

e

−

1

2

x

2

_

= g(x

1

)h(x

2

).

The joint pdf can be written as a product of two functions, one depending on x

1

only and the

other on x

2

only for all pairs (x

1

x

2

) and the domains are independent. Hence, the random

variables X

1

and X

2

are independent.

4. Write g(X

1

) =

1

X

1

and h(X

2

) = X

2

. Then, by part 2 of Theorem 1.13 we have

E[g(X

1

)h(X

2

)] = E[g(X

1

)] E[h(X

2

)] = E

_

1

X

1

_

E[X

2

].

The marginal pdfs for X

1

and X

2

, respectively, are

f

X

1

(x

1

) =

1

4

x

1

e

−

1

2

x

1

, f

X

2

(x

2

) =

1

2

e

−

1

2

x

2

.

Hence,

E

_

1

X

1

_

=

1

4

_

∞

0

1

x

1

x

1

e

−

1

2

x

1

dx

1

=

1

4

_

∞

0

e

−

1

2

x

1

dx

1

=

1

2

.

E[X

2

] =

1

2

_

∞

0

x

2

e

−

1

2

x

2

dx

2

= 2e

−1

.

Finally,

E

_

X

2

X

1

_

= E

_

1

X

1

_

E[X

2

] = e

−1

.

9

Exercise 1.18 Take the nonnegative function of t, g(t) = var(tX

1

+X

2

) ≥ 0. Then,

g(t) = var(tX

1

+X

2

) = E

_

(tX

1

+X

2

−(tµ

1

+µ

2

))

2

¸

= E

_

(t(X

1

−µ

1

) + (X

2

−µ

2

))

2

¸

= E

_

((X

1

−µ

1

)

2

t

2

+ 2(X

1

−µ

1

)(X

2

−µ

2

)t + (X

2

−µ

2

)

2

¸

= var(X

1

) ×t

2

+ 2 cov(X

1

, X

2

) ×t + var(X

2

).

1. g(t) ≥ 0, hence there is one or no real roots of this quadratic function. The discriminant is:

∆ = 4 cov

2

(X

1

, X

2

) −4 var(X

1

) var(X

2

) ≤ 0.

Hence,

cov

2

(X

1

, X

2

) ≤ var(X

1

) var(X

2

),

that is

−1 ≤

cov

2

(X

1

, X

2

)

var(X

1

) var(X

2

)

≤ 1

and so

−1 ≤ ρ(X

1

, X

2

) ≤ 1.

2. |ρ(X

1

, X

2

)| = 1 if and only if the discriminant is equal to zero. That is ρ(X

1

, X

2

) = 1 if and

only if g(t) has a single root. But since (t(X

1

− µ

1

) + (X

2

− µ

2

))

2

≥ 0, the expected value

g(t) = E

_

(t(X

1

−µ

1

) + (X

2

−µ

2

))

2

¸

= 0 if and only if

P

_

[t(X

1

−µ

1

) + (X

2

−µ

2

)]

2

= 0

_

= 1.

This is equivalent to

P (t(X

1

−µ

1

) + (X

2

−µ

2

) = 0) = 1.

It means that P(X

2

= aX

1

+b) = 1 with a = −t and b = µ

1

t +µ

2

, where t is the root of g(t),

that is

t = −

cov(X

1

, X

2

)

var(X

1

)

.

Hence, a = −t has the same sign as ρ(X

1

, X

2

).

Exercise 1.19 We have

f

X,Y

(x, y) =

_

8xy, for 0 ≤ x < y ≤ 1;

0, otherwise.

(a) Variables X and Y are not independent because their ranges are not independent.

(b) We have cov(X, Y ) = E(XY ) −E(X) E(Y ).

E(XY ) =

_

1

0

_

1

x

xy8xydydx =

4

9

.

To calculate E(X) and E(Y ) we need the marginal pdfs for X and for Y .

f

X

(x) =

_

1

x

8xydy = 4x(1 −x

2

) on (0, 1);

and

f

Y

(y) =

_

1

0

8xydx = 4y

3

on (0, 1).

Now we can calculate the expectations, that is,

E(X) =

_

1

0

x4x(1 −x

2

)dx =

8

15

, E(Y ) =

_

1

0

y4y

3

dy =

4

5

.

Hence,

cov(X, Y ) =

4

9

−

8

15

4

5

≈ 0.01778.

10

(c) The transformation and the inverses are, respectively,

u =

x

y

and v = y;

and

x = uy = uv and y = v.

Then, Jacobian of the transformation is

J = det

_

v u

0 1

_

= v.

So, by the theorem for a bivariate transformation we can write

f

U,V

(u, v) = f

X,Y

(uv, v)|J| = 8uv

3

.

The support of (U, V )is

D = {(u, v) : 0 ≤ u < 1, 0 < v ≤ 1}.

(d) The joint pdf for (U, V ) is a product of two functions: one depends on u only and the other on

v only. Also, the ranges of U and V are independent. Hence, U and V are independent random

variables.

(e) U and V are independent, hence cov(U, V ) = 0.

Exercise 1.20 (a) Here u =

x

x+y

and v = x +y. Hence, the inverses are

x = uv and y = v −uv.

The Jacobian of the transformation is J = v.

Furthermore, the joint pdf of (X, Y ) (by independence) is

f

X,Y

(x, y) = λ

2

e

−λ(x+y)

.

Hence, by the transformation theorem we get

f

U,V

(u, v) = λ

2

e

−λv

v

and the support for (U, V ) is

D = {(u, v) : 0 < u < 1, 0 < v < ∞}.

(b) Random variables U and V are independent because their joint pdf can be written as a product

of two functions, one depending on v only and the other on u only (this is a constant) and their

domains are independent.

(c) We will ﬁnd the marginal pdf for U.

f

U

(u) =

_

∞

0

λ

2

e

−λv

vdv = 1

on (0, 1). That is U ∼ Uniform(0, 1).

(d) A sum of two identically distributed exponential random variables has the Erlang(2, λ) distribu-

tion.

11

Exercise 1.21 (a) By deﬁnition of expectation we can write

E(Y

a

) =

_

∞

0

y

a

y

α−1

λ

α

e

−λy

Γ(α)

dy

=

_

∞

0

y

a+α−1

λ

α

e

−λy

Γ(α)

λ

a

Γ(a +α)

λ

a

Γ(a +α)

dy

=

Γ(a +α)

λ

a

Γ(α)

_

∞

0

y

a+α−1

λ

a+α

e

−λy

Γ(a +α)

dy

. ¸¸ .

(pdf of a gamma rv)

=

Γ(a +α)

λ

a

Γ(α)

.

Now, the parameters of a gamma distribution have to be positive, hence a +α > 0.

(b) To obtain E(Y ) we apply the above result with a = 1. This gives

E(Y ) =

Γ(1 +α)

λΓ(α)

=

αΓ(α)

λΓ(α)

=

α

λ

.

For the variance we have var(Y ) = E(Y

2

) − [E(Y )]

2

. Hence we need to ﬁnd E(Y

2

). Again,

we will use the result of point (a), now with a = 2. That is,

E(Y

2

) =

Γ(2 +α)

λ

2

Γ(α)

=

(1 +α)Γ(1 +α)

λ

2

Γ(α)

=

(1 +α)αΓ(α)

λ

2

Γ(α)

=

α(1 +α)

λ

2

.

Hence, the variance is

var(Y ) =

α(1 +α)

λ

2

−

_

α

λ

_

2

=

α

λ

2

.

(c) X ∼ χ

2

ν

is a special case of gamma distribution when α =

ν

2

and λ =

1

2

. Hence, from (b) we

obtain

E(X) =

α

λ

=

ν

2

2 = ν, var(X) =

α

λ

2

=

ν

2

4 = 2ν.

**Exercise 1.22 Here we have Y
**

i

∼

iid

N(0, 1), Y =

1

5

5

i=1

Y

i

and Y

6

∼ N(0, 1) independently of the other

Y

i

.

(a) Y

i

∼

iid

N(0, 1), hence Y

2

i

∼

iid

χ

2

1

. Then, by Corollary 1.1 we get

W =

5

i=1

Y

2

i

∼ χ

2

5

.

(b) U =

5

i=1

(Y

i

− Y )

2

=

4S

2

σ

2

, where S

2

=

1

4

5

i=1

(Y

i

− Y )

2

and σ

2

= 1. Hence, as shown in

Example 1.34 we have

U =

4S

2

σ

2

∼ χ

2

4

.

(c) U ∼ χ

2

4

and Y

2

6

∼ χ

2

1

. Hence, by Corollary 1.1 we get

U +Y

2

6

∼ χ

2

5

.

(d) Y

6

∼ N(0, 1) and W ∼ χ

2

5

. Hence, by Theorem 1.19, we get

Y

6

_

W/5

∼ t

5

.

12

Exercise 1.23 Let X = (X

1

, X

2

, . . . , X

n

)

T

be a vector of mutually independent rvs with

mgfs M

X

1

(t), M

X

2

(t), . . . , M

X

n

(t) and let a

1

, a

2

, . . . , a

n

and b

1

, b

2

, . . . , b

n

be ﬁxed constants. Then

the mgf of the random variable Z =

n

i=1

(a

i

X

i

+b

i

) is

M

Z

(t) = e

t

b

i

n

i=1

M

X

i

(a

i

t).

Proof. By deﬁnition of the mgf we can write

M

Z

(t) = E

_

e

tZ

_

= E

_

e

t

n

i=1

(a

i

X

i

+b

i

)

_

= E

_

e

t

n

i=1

a

i

X

i

+t

n

i=1

b

i

_

= E

_

e

t

n

i=1

b

i

e

t

n

i=1

a

i

X

i

_

= e

t

n

i=1

b

i

E

n

i=1

e

ta

i

X

i

= e

t

n

i=1

b

i

n

i=1

Ee

ta

i

X

i

= e

t

n

i=1

b

i

n

i=1

M

X

i

(ta

i

).

**Exercise 1.24 Here we will use the result of Theorem 1.22. We know that the mgf of a normal rv X with
**

mean µ and variance σ

2

is

M

X

(t) = e

(2tµ+t

2

σ

2

)/2

.

Hence, by Theorem 1.22 with a

i

=

1

n

and b

i

= 0 for all i = 1, . . . , n, we can write

M

X

(t) =

n

i=1

M

X

i

_

1

n

t

_

=

n

i=1

e

(2

1

n

tµ+

1

n

2

t

2

σ

2

)/2

=

_

e

(2

1

n

tµ+

1

n

2

t

2

σ

2

)/2

_

n

= e

n(2

1

n

tµ+

1

n

2

t

2

σ

2

)/2

= e

(2tµ+t

2 σ

2

n

)/2

.

This is the mgf of a normal rv with mean µ and variance σ

2

/n, that is

X ∼ N

_

µ,

σ

2

n

_

.

Exercise 1.25

Here g(X) = Y = AX, hence, h(Y ) = X = A

−1

Y = BY , where B = A

−1

. The Jacobian is

J

h

(y) = det

∂

∂y

h(y) = det

∂

∂y

By = det B = det A

−1

=

1

det A

.

Hence, by Theorem 1.25 we have

f

Y

(y) = f

X

_

h(y)

_¸

¸

J

h

(y)

¸

¸

= f

X

_

A

−1

Y )

_¸

¸

1

det A

|.

13

Exercise 1.26

A multivariate normal rv has the pdf of the form

f

X

(x) =

1

(2π)

n/2

√

det V

exp

_

−

1

2

(x −µ)

T

V

−1

(x −µ)

_

.

Then, by the result of Exercise 1.25, for X ∼ N

n

(µ, V ) and Y = AX we have

f

Y

(y) = f

X

_

A

−1

Y )

_

1

| det A|

=

1

| det A|

1

(2π)

n/2

√

det V

exp

_

−

1

2

(A

−1

y −µ)

T

V

−1

(A

−1

y −µ)

_

=

1

| det A|

1

(2π)

n/2

√

det V

exp

_

−

1

2

_

A

−1

(y −Aµ)

¸

T

V

−1

A

−1

(y −Aµ)

_

=

1

| det A|

1

(2π)

n/2

√

det V

exp

_

−

1

2

(y −Aµ)

T

(A

−1

)

T

V

−1

A

−1

(y −Aµ)

_

=

1

(2π)

n/2

_

det V (det A)

2

exp

_

−

1

2

(y −Aµ)

T

_

AV A

T

_

−1

(y −Aµ)

_

=

1

(2π)

n/2

_

det(AV A

T

)

exp

_

−

1

2

(y −Aµ)

T

_

AV A

T

_

−1

(y −Aµ)

_

This is a pdf of a multivariate normal rv with mean Aµ and variance-covariance matrix AV A

T

.

14

CHAPTER 2

Exercise 2.1 Suppose that Y = (Y

1

, . . . , Y

n

) is a random sample from an Exp(λ) distribution. Then we

may write

f

Y

(y) =

n

i=1

λe

−λy

i

= λ

n

e

−λ

n

i=1

y

i

. ¸¸ .

g

λ

(T(Y ))

× 1

.¸¸.

h(y)

.

It follows that T(Y ) =

n

i=1

Y

i

is a sufﬁcient statistic for λ.

**Exercise 2.2 Suppose that Y = (Y
**

1

, . . . , Y

n

) is a random sample from an Exp(λ) distribution. Then the

ratio of the joint pdfs at two different realizations of Y , x and y, is

f(x; λ)

f(y; λ)

=

λ

n

e

−λ

n

i=1

x

i

λ

n

e

−λ

n

i=1

y

i

= e

λ

_

n

i=1

y

i

−

n

i=1

x

i

_

The ratio is constant iff

n

i=1

y

i

=

n

i=1

x

i

. Hence, by Lemma 2.1, T(Y ) =

n

i=1

Y

i

is a minimal

sufﬁcient statistic for λ.

Exercise 2.3 Y

i

are identically distributed, hance have the same expectation, say E(Y ), for all i = 1, . . . , n.

Here, for y ∈ [0, θ], we have:

E(Y ) =

2

θ

2

_

θ

0

y(θ −y)dy

=

2

θ

2

_

θ

0

(θy −y

2

)dy

=

2

θ

2

_

θ

1

2

y

2

−

1

3

y

3

_

θ

0

=

1

3

θ.

Bias:

E(T(Y )) = E(3Y ) = 3

1

n

n

i=1

E(Y

i

) = 3

1

n

n

1

3

θ = θ.

That it bias(T(Y )) = E(T(Y )) −θ = 0.

Variance: Y

i

are identically distributed, hance have the same variance, say var(Y ), for all i =

1, . . . , n,

var(Y ) = E(Y

2

) −[E(Y )]

2

.

We need to calculate E(Y

2

).

E(Y

2

) =

2

θ

2

_

θ

0

y

2

(θ −y)dy

=

2

θ

2

_

θ

0

(θy

2

−y

3

)dy

=

2

θ

2

_

θ

1

3

y

3

−

1

4

y

4

_

θ

0

=

1

6

θ

2

.

Hence var(Y ) = E(Y

2

) −[E(Y )]

2

=

1

6

θ

2

−

1

9

θ

2

=

2

9

θ

2

. This gives

var(T(Y )) = 9 var(Y ) = 9

1

n

2

n

i=1

var(Y

i

) = 9

1

n

2

n

i=1

2

9

θ

2

= 9

1

n

2

n

2

9

θ

2

=

2

n

θ

2

.

15

Consistency:

T(Y ) is unbiased, so it is enough to check if its variance tends to zero when n tends to inﬁnity.

Indeed, as n →∞we have

2

n

θ

2

→0, that is T(Y ) = 3Y is a consistent estimator of θ.

Exercise 2.4 We have X

i

∼

iid

Bern(p) for i = 1, . . . , n. Also, X =

1

n

n

i=1

X

i

.

(a) For an estimator

´

ϑ to be consistent for ϑ we require that the MSE(

´

ϑ) →0 as n →∞. We have

MSE(

´

ϑ) = var(

´

ϑ) + [bias(

´

ϑ)]

2

.

We will now calculate the variance and bias of ´ p = X.

E(X) =

1

n

n

i=1

E(X

i

) =

1

n

np = p.

Hence X is an unbiased estimator of p.

var(X) =

1

n

2

n

i=1

var(X

i

) =

1

n

2

npq =

1

n

pq.

Hence, MSE(X) =

1

n

pq →0 as n →0, that is, X is a consistent estimator of p.

(b) The estimator

´

ϑ is asymptotically unbiased for ϑ if E(

´

ϑ) →ϑ as n →∞, that is, the bias tends

to zero as n →∞. Here we have

E( ´ pq) = E[X(1 −X)] = E[X −X

2

] = E[X] −E[X

2

].

Note that we can write

E[X

2

] = var(X) + [E(X)]

2

=

1

n

pq +p

2

.

That is

E( ´ pq) = p −

_

1

n

pq +p

2

_

=

pq(n −1)

n

→pq as n →∞.

Hence, the estimator is asymptotically unbiased for pq.

**Exercise 2.5 Here we have a single parameter p and g(p) = p.
**

By deﬁnition the CRLB(p) is

CRLB(p) =

_

dg(p)

dp

_

2

E

_

−

d

2

log P(Y =y;p)

dp

2

_, (1)

where the joint pmf of Y = (Y

1

, . . . , Y

n

)

T

, where Y

i

∼ Bern(p) independently, is

P(Y = y; p) =

n

i=1

p

y

i

(1 −p)

1−y

i

= p

n

i=1

y

i

(1 −p)

n−

n

i=1

y

i

.

For the numerator of (1) we get

dg(p)

dp

= 1. Further, for brevity denote P = P(Y = y; p). For the

denominator of (1) we calculate

log P =

n

i=1

y

i

log p + (n −

n

i=1

y

i

) log(1 −p)

16

and

d log P

dp

=

n

i=1

y

i

p

+

n

i=1

y

i

−n

1 −p

,

d

2

log P

dp

2

= −

n

i=1

y

i

p

2

+

n

i=1

y

i

−n

(1 −p)

2

.

Hence, since E(Y

i

) = p for all i, we get

E

_

−

d

2

log P

dp

2

_

= E

_

n

i=1

Y

i

p

2

_

−E

_

n

i=1

Y

i

−n

(1 −p)

2

_

=

np

p

2

−

np −n

(1 −p)

2

=

n

p(1 −p)

Hence, CRLB(p) =

p(1−p)

n

.

Since var(Y ) =

p(1−p)

n

, it means that var(Y ) achieves the bound and so Y has the minimum variance

among all unbiased estimators of p.

**Exercise 2.6 From lectures, we know that the joint pdf of independent normal r.vs is
**

f(y; µ, σ

2

) = (2πσ

2

)

−

n

2

exp

_

−

1

2σ

2

n

i=1

(y

i

−µ)

2

_

.

Denote f = f(y|µ, σ

2

). Taking log of the pdf we obtain

log f = −

n

2

log(2πσ

2

) −

1

2σ

2

n

i=1

(y

i

−µ)

2

.

Thus, we have

∂ log f

∂µ

=

1

σ

2

n

i=1

(y

i

−µ)

and

∂ log f

∂σ

2

= −

n

2σ

2

+

1

2σ

4

n

i=1

(y

i

−µ)

2

.

It follows that

∂

2

log

∂µ

2

= −

n

σ

2

,

∂

2

log f

∂µ∂σ

2

= −

1

σ

4

n

i=1

(y

i

−µ)

and

∂

2

log f

∂σ

4

=

n

2σ

4

−

1

σ

6

n

i=1

(y

i

−µ)

2

.

Hence, taking expectation of each of the second derivatives we obtain the Fisher information matrix

M =

_

n

σ

2

0

0

n

2σ

4

_

.

Now, let g(µ, σ

2

) = µ +σ

2

. Then we have ∂g/∂µ = 1 and ∂g/∂σ

2

= 1. So

CRLB(g(µ, σ

2

)) = (1, 1)

_

n

σ

2

0

0

n

2σ

4

_

−1

_

1

1

_

= (1, 1)

_

σ

2

n

0

0

2σ

4

n

_

_

1

1

_

=

σ

2

n

(1 + 2σ

2

).

17

Exercise 2.7 Suppose that Y

1

, . . . , Y

n

are independent Poisson(λ) random variables. Then we know that

T =

n

i=1

Y

i

is a sufﬁcient statistic for λ.

Now, we need to ﬁnd out what is the distribution of T. We showed in Exercise 1.10 that the mgf of a

Poisson(λ) rv is

M

Y

(z) = e

λ(e

z

−1)

.

Hence, we may write (we used z not to be confused with the values of T, denoted by t).

M

T

(z) =

n

i=1

M

Y

i

(z) =

n

i=1

e

λ(e

z

−1)

= e

nλ(e

z

−1)

.

Hence, T ∼ Poisson(nλ), and so its probability mass function is

P(T = t) =

(nλ)

t

e

−nλ

t!

, t = 0, 1, . . . .

Next, suppose that

E{h(T)} =

∞

t=0

h(t)

(nλ)

t

e

−nλ

t!

= 0

for λ > 0. Then we have

∞

t=0

h(t)

t!

(nλ)

t

= 0

for λ > 0. Thus, every coefﬁcient h(t)/t! is zero, so that h(t) = 0 for all t = 0, 1, 2, . . ..

Since T takes on values t = 0, 1, 2, . . . with probability 1 it means that

P{h(T) = 0} = 1

for all λ. Hence, T =

n

i=1

Y

i

is a complete sufﬁcient statistic.

Exercise 2.8 S =

n

i=1

Y

i

is a complete sufﬁcient statistic for λ. We have seen that T = Y = S/n is a

MVUE for λ. Now, we will ﬁnd a unique MVUE of φ = λ

2

. We have

E(T

2

) = E

_

1

n

2

S

2

_

=

1

n

2

E(S

2

)

=

1

n

2

_

var(S) + [E(S)]

2

¸

=

1

n

2

_

nλ +n

2

λ

2

¸

=

1

n

λ +λ

2

=

1

n

E(T) +λ

2

.

It means that

E

_

T

2

−

1

n

T

¸

= λ

2

,

i.e., T

2

−

1

n

T = Y

2

−

1

n

Y is an unbiased estimator of λ

2

. It is a function of a complete sufﬁcient

statistic, hence it is the unique MVUE of λ

2

.

18

Exercise 2.9 We may write

P(Y = y; λ) =

λ

y

e

−λ

y!

=

1

y!

exp{y log λ −λ}

=

1

y!

exp{(log λ)y −λ}.

Thus, we have a(λ) = log λ, b(y) = y, c(λ) = −λ and h(y) =

1

y!

. That is the P(Y = y; λ) has a

representation of the form required by Deﬁnition 2.10.

**Exercise 2.10 (a) Here, for y > 0, we have
**

f(y|λ, α) =

λ

α

Γ(α)

y

α−1

e

−λy

= exp

_

log

_

λ

α

Γ(α)

y

α−1

e

−λy

__

= exp

_

−λy + (α −1) log y + log

_

λ

α

Γ(α)

__

This has the required form of Deﬁnition 2.10, where p = 2 and

a

1

(λ, α) = −λ

a

2

(λ, α) = α −1

b

1

(y) = y

b

2

(y) = log y

c(λ, α) = log

_

λ

α

Γ(α)

_

h(y) = 1

(b) By Theorem 2.8 (lecture notes) we have that

S

1

(Y ) =

n

i=1

Y

i

and S

2

(Y ) =

n

i=1

log Y

i

are the joint complete sufﬁcient statistics for λ and α.

**Exercise 2.11 To obtain the Method of Moments estimators we compare the population and the sample
**

moments. For a one parameter distribution we obtain

´

θ as the solution of:

E(Y ) = Y . (2)

Here, for y ∈ [0, θ], we have:

E(Y ) =

2

θ

2

_

θ

0

y(θ −y)dy

=

2

θ

2

_

θ

0

(θy −y

2

)dy

=

2

θ

2

_

θ

1

2

y

2

−

1

3

y

3

_

θ

0

=

1

3

θ.

19

Then by (2) we get the method of moments estimator of θ:

´

θ = 3Y .

**Exercise 2.12 (a) First, we will showthat the distribution belongs to an exponential family. Here, for y > 0
**

and known α, we have

f(y|λ, α) =

λ

α

Γ(α)

y

α−1

e

−λy

= y

α−1

λ

α

Γ(α)

e

−λy

= y

α−1

exp

_

log

_

λ

α

Γ(α)

e

−λy

__

= y

α−1

exp

_

−λy + log

_

λ

α

Γ(α)

__

This has the required form of Deﬁnition 2.11, where p = 1 and

a(λ) = −λ

b(y) = y

c(λ) = log

_

λ

α

Γ(α)

_

h(y) = y

α−1

By Theorem 2.8 (lecture notes) we have that

S(Y ) =

n

i=1

Y

i

is the complete sufﬁcient statistic for λ.

(b) The likelihood function is

L(λ; y) =

n

i=1

λ

α

Γ(α)

y

α−1

i

e

−λy

i

=

n

i=1

λ

α

Γ(α)

e

log

_

y

α−1

i

_

e

−λy

i

=

_

λ

α

Γ(α)

_

n

e

(α−1)

n

i=1

log y

i

e

−λ

n

i=1

y

i

The the log-likelihood is

l(λ; y) = log L(λ; Y ) = αnlog λ −nlog Γ(α) + (α −1)

n

i=1

log y

i

−λ

n

i=1

y

i

.

Then, we obtain the following derivative of l(λ; Y ) with respect to λ:

dl

dλ

= αn

1

λ

−

n

i=1

y

i

This, set to zero, gives

´

λ =

αn

n

i=1

y

i

=

α

y

.

20

Hence, the MLE(λ) = α/Y . So, we get

MLE[g(λ)] = MLE

_

1

λ

_

=

1

MLE(λ)

=

1

α

Y =

1

αn

n

i=1

Y

i

=

1

αn

S(Y ).

That is, MLE[g(λ)] is a function of the complete sufﬁcient statistic.

(c) To show that it is an unbiased estimator of g(λ) we calculate:

E[g(

´

λ)] = E

_

1

αn

n

i=1

Y

i

_

=

1

αn

n

i=1

E(Y

i

) =

1

αn

nα

1

λ

=

1

λ

.

It is an unbiased estimator and a function of a complete sufﬁcient statistics, hence, by Corollary

2.2 (given in Lectures), it is the unique MVUE(g(λ)).

**Exercise 2.13 (a) The likelihood is
**

L(β

0

, β

1

; y) = (2πσ

2

)

−

n

2

exp

_

−

1

2σ

2

n

i=1

(y

i

−β

0

−β

1

x

i

)

2

_

.

Now, maximizing this is equivalent to minimizing

S(β

0

, β

1

) =

n

i=1

(y

i

−β

0

−β

1

x

i

)

2

,

which is the criterion we use to ﬁnd the least squares estimators. Hence, the maximumlikelihood

estimators are the same as the least squares estimators.

(c) The estimates of β

0

and β

1

are

´

β

0

= Y −

´

β

1

x = 94.123,

´

β

1

=

n

i=1

x

i

Y

i

−nxY

n

i=1

x

2

i

−nx

2

= −1.266.

Hence the estimate of the mean response at a given x is

E(Y |x = 40) = 94.123 −1.266x.

For the temperature of x = 40 degrees we obtain the estimate of expected hardness equal to

E(Y |x) = 43.483.

**Exercise 2.14 The LS estimator of β
**

1

is

´

β

1

=

n

i=1

x

i

Y

i

−nxY

n

i=1

x

2

i

−nx

2

.

We will see that it has a normal distribution and is unbiased, and we will ﬁnd its variance. Now,

normality is clear from the fact that we may write

´

β

1

=

n

i=1

x

i

Y

i

−x

n

i=1

Y

i

n

i=1

x

2

i

−nx

2

=

n

i=1

(x

i

−x)

S

xx

Y

i

,

21

where S

xx

=

n

i=1

x

2

i

−nx

2

, so that

ˆ

β

1

is a linear function of Y

1

, . . . , Y

n

, each of which is normally

distributed. Next, we have

E(

´

β

1

) =

1

S

xx

n

i=1

(x

i

−x) E(Y

i

)

=

1

S

xx

_

n

i=1

x

i

E(Y

i

) −x

n

i=1

E(Y

i

)

_

=

1

S

xx

_

n

i=1

x

i

(β

0

+β

1

x

i

) −x

n

i=1

(β

0

+β

1

x

i

)

_

=

1

S

xx

_

nβ

0

x +β

1

n

i=1

x

2

i

−nβ

0

x −nβ

1

x

2

_

=

1

S

xx

_

n

i=1

x

2

i

−nx

2

_

β

1

=

1

S

xx

S

xx

β

1

= β

1

.

Finally, since the Y

i

s are independent, we have

var(

ˆ

β

1

) =

1

(S

xx

)

2

n

i=1

(x

i

−x)

2

var(Y

i

)

=

1

(S

xx

)

2

n

i=1

(x

i

−x)

2

σ

2

=

1

(S

xx

)

2

S

xx

σ

2

=

σ

2

S

xx

.

Hence,

ˆ

β

1

∼ N(β

1

, σ

2

/S

xx

) and a 100(1 −α)% conﬁdence interval for β

1

is

ˆ

β

1

±t

n−2,

α

2

¸

S

2

S

xx

,

where S

2

=

1

n−2

n

i=1

(Y

i

−

ˆ

β

0

−

ˆ

β

1

x

i

)

2

is the MVUE for σ

2

.

**Exercise 2.15 (a) The likelihood is
**

L(θ; y) =

n

i=1

θy

θ−1

i

= θ

n

_

n

i=1

y

i

_

θ−1

,

and so the log-likelihood is

(θ; y) = nlog θ + (θ −1) log

_

n

i=1

y

i

_

.

Thus, solving the equation

d

dθ

=

n

θ

+ log

_

n

i=1

y

i

_

= 0,

we obtain the maximum likelihood estimator of θ as

ˆ

θ = −n/ log(

n

i=1

Y

i

).

(b) Since

d

2

dθ

2

= −

n

θ

2

,

22

we have

CRLB(θ) =

1

E

_

−

d

2

dθ

2

_ =

θ

2

n

.

Thus, for large n,

ˆ

θ ∼ N(θ, θ

2

/n).

(c) Here we have to replace CRLB(θ) with its estimator to obtain the approximate pivot

Q(Y , θ) =

´

θ −θ

_

θ

2

n

∼

approx.

AN(0, 1)

This gives

P

_

_

_

−z

α

2

<

´

θ −θ

_

θ

2

n

< z

α

2

_

_

_

1 −α.

where z

α

2

is such that P(|Z| < z

α

2

) = 1 −α, Z ∼ N(0, 1). It may be rearranged to yield

P

_

_

_

ˆ

θ −z

α

2

¸

´

θ

2

n

< θ <

ˆ

θ +z

α

2

¸

´

θ

2

n

_

_

_

1 −α.

Hence, an approximate 100(1 −α)% conﬁdence interval for θ is

´

θ ±z

α

2

¸

´

θ

2

n

.

Finally, the approximate 90% conﬁdence interval for θ is

ˆ

θ ±1.6449

¸

ˆ

θ

2

n

,

where

´

θ = −n/ log(

n

i=1

Y

i

).

**Exercise 2.16 (a) For a Poisson distribution we have the MLE(λ) equal to
**

´

λ = Y

and

´

λ ∼ AN

_

λ,

λ

n

_

.

Hence,

´

λ

1

−

´

λ

2

∼ AN

_

λ

1

−λ

2

,

λ

1

n

1

+

λ

2

n

2

_

So, after standardization, we get

´

λ

1

−

´

λ

2

−(λ

1

−λ

2

)

_

λ

1

n

1

+

λ

2

n

2

∼ AN(0, 1).

Hence, the approximate pivot for λ

1

−λ

2

is

Q(Y , λ

1

−λ

2

) =

´

λ

1

−

´

λ

2

−(λ

1

−λ

2

)

_

λ

1

n

1

+

λ

2

n

2

∼

approx.

N(0, 1).

23

Then, for z

α

2

such that P(|Z| < z

α

2

) = 1 −α, Z ∼ N(0, 1),

we may write

P

_

_

_

−z

α

2

<

´

λ

1

−

´

λ

2

−(λ

1

−λ

2

)

_

λ

1

n

1

+

λ

2

n

2

< z

α

2

_

_

_

1 −α.

what gives

P

_

_

_

´

λ

1

−

´

λ

2

−z

α

2

¸

´

λ

1

n

1

+

´

λ

2

n

2

< λ

1

−λ

2

<

´

λ

1

+

´

λ

2

−z

α

2

¸

´

λ

1

n

1

+

´

λ

2

n

2

_

_

_

1 −α.

That is, a 100(1 −α)% CI for λ

1

−λ

2

is

Y

1

−Y

2

±z

α

2

¸

Y

1

n

1

+

Y

2

n

2

.

(b) Denote:

Y

i

- density of seedlings of tree A at a square meter area i around the tree;

X

i

- density of seedlings of tree B at a square meter area i around the tree.

Then, we may assume that Y

i

∼

iid

Poisson(λ

1

) and X

i

∼

iid

Poisson(λ

2

).

We are interested in the difference in the mean density, i.e., in λ

1

−λ

2

.

From the data we get:

´

λ

1

−

´

λ

2

= 1,

´

λ

1

n

1

+

´

λ

2

n

2

=

17

70

.

Hence, the approximate 99% CI for λ

1

−λ

2

is

_

1 −2.5758

_

17

70

, 1 + 2.5758

_

17

70

_

= [−0.269, 2.269].

The CI includes zero, hence, at the 1% signiﬁcance level, there is no evidence to reject H

0

:

λ

1

−λ

2

= 0 against H

1

: λ

1

−λ

2

= 0, that is, there is no evidence to say, at the 1% signiﬁcance

level, that tree A produced a higher density of seedlings than tree B did.

Exercise 2.17 (a) Y

i

∼

iid

Bern(p), i = 1, . . . , n and we are interested in testing a hypothesis

H

0

: p = p

0

against H

1

: p = p

1

. The likelihood is

L(p; y) = p

n

i=1

y

i

(1 −p)

n−

n

i=1

y

i

and so we get the likelihood ratio:

λ(p) =

L(p

0

; y)

L(p

1

; y)

=

p

n

i=1

y

i

0

(1 −p

0

)

n−

n

i=1

y

i

p

n

i=1

y

i

1

(1 −p

1

)

n−

n

i=1

y

i

=

_

p

0

p

1

_

n

i=1

y

i

_

1 −p

0

1 −p

1

_

n−

n

i=1

y

i

=

_

p

0

(1 −p

1

)

p

1

(1 −p

0

)

_

n

i=1

y

i

_

1 −p

0

1 −p

1

_

n

.

Then, the critical region is

R = {y : λ(p) ≤ a},

24

where a is a constant chosen to give signiﬁcance level α. It means that we reject the null

hypothesis if

_

p

0

(1 −p

1

)

p

1

(1 −p

0

)

_

n

i=1

y

i

_

1 −p

0

1 −p

1

_

n

≤ a,

which is equivalent to

_

p

0

(1 −p

1

)

p

1

(1 −p

0

)

_

n

i=1

y

i

≤ b,

or, after taking logs of both sides, to

n

i=1

y

i

log

_

p

0

(1 −p

1

)

p

1

(1 −p

0

)

_

≤ c,

where b and c are constants chosen to give signiﬁcance level α.

When p

1

> p

0

we have

log

_

p

0

(1 −p

1

)

p

1

(1 −p

0

)

_

< 0.

Hence, the critical region can be written as

R = {y : y ≥ d},

for some constant d chosen to give signiﬁcance level α.

By the central limit theorem, we have that (when the null hypothesis is true, i.e., p = p

0

):

Y ∼ AN

_

p

0

,

p

0

(1 −p

0

)

n

_

.

Hence,

Z =

Y −p

0

_

p

0

(1−p

0

)

n

∼ AN(0, 1)

and we may write

α

∼

= P(Y ≥ d|p = p

0

) = P(Z ≥ z

α

),

where z

α

=

d−p

0

p

0

(1−p

0

)

n

. Hence d = p

0

+z

α

_

p

0

(1−p

0

)

n

and the critical region is

R =

_

y : y ≥ p

0

+z

α

_

p

0

(1 −p

0

)

n

_

.

(b) The critical region does not depend on p

1

, hence it is the same for all p > p

0

and so there is a

uniformly most powerful test for H

0

: p = p

0

against H

1

: p > p

0

.

The power function is

β(p) = P(Y ∈ R|p)

= P

_

Y ≥ p

0

+z

α

_

p

0

(1 −p

0

)

n

|p

_

= P

_

_

Y −p

_

p(1−p)

n

≥

p

0

+z

α

_

p

0

(1−p

0

)

n

−p

_

p(1−p)

n

_

_

∼

= 1 −Φ{g(p)},

where g(p) =

p

0

+z

α

p

0

(1−p

0

)

n

−p

p(1−p)

n

and Φ denotes the cumulative distribution function of the

standard normal distribution.

25

Question 2.18 (a) Let us denote:

Y

i

∼ Bern(p) - a response of mouse i to the drug candidate.

Then, from Question 1, we have the following critical region

R =

_

y : y ≥ p

0

+z

α

_

p

0

(1 −p

0

)

n

_

.

Here p

0

= 0.1, n = 30, α = 0.05, z

α

= 1.6449. It gives

R = {y : y ≥ 0.19}

Fromthe sample we have ´ p = y =

6

30

= 0.2, that is there is evidence to reject the null hypothesis

at the signiﬁcance level α = 0.05.

(b) The power function is

β(p)

∼

= 1 −Φ{g(p)},

where g(p) =

p

0

+z

α

p

0

(1−p

0

)

n

−p

p(1−p)

n

. When n = 30, p

0

= 0.1 and p = 0.2 we obtain, for

z

0.05

= 1.6449,

g(0.2) = −0.1356 and Φ(−0.1356) = 1 −Φ(0.1356) = 1 −0.5539 = 0.4461.

It gives the power equal to β(0.2) = 0.5539. It means that the probability of type II error is

0.4461, which is rather high.

This is because the value of the alternative hypothesis is close to the null hypothesis and also the

number of observations is not large.

To ﬁnd what n is needed to get the power β(0.2) = 0.8 we calculate:

g(p) =

0.1 +z

0.05

_

0.09

n

−0.2

_

0.16

n

= 0.75 z

0.05

−0.25

√

n.

For β(p)

∼

= 1 −Φ{g(p)} to be equal to 0.8 it means that Φ{g(p)} = 0.2. From statistical tables

we obtain that g(p) = −0.8416. Hence, it gives, for z

0.05

= 1.6449,

n = (4 ×0.8416 + 3 ×1.6449)

2

= 68.9.

At least n = 69 mice are needed to obtain as high power test as 0.8 for detecting that the

proportion is 0.2 rather than 0.1.

26