# STATISTICAL INFERENCE CLASS NOTE SANGKAR ROY JAHANGIRNAGAR UNIVERSITY,BANGLADESH. EMAIL:sankar1604@gmail.

com

Estimation ~ 1 of 22

Estimation
Sufficient Statistic: Let X1 , X 2 ,
, Xn

be a random sample from the density f (⋅ ; θ ) , where θ may
, Xn )

be a vector. A statistic T = t ( X1 , X 2 , conditional distribution of X1 , X 2 ,

is defined to be a sufficient statistic if and only if the

, Xn

given T=t does not depend on θ for any value t of T .

Note: This definition of a sufficient statistic is not very workable. First, it does not tell us which statistic is likely to be sufficient and second, it requires us to derive a conditional distribution which may not be easy, especially for continuous random variables. For this reason we may use Factorization Criterion that may aid us in finding sufficient statistics. Another Definition: Let X1 , X 2 ,
T = t ( X1 , X 2 ,

, Xn

be a random sample from the density f (⋅ ; θ ) . A statistic

, Xn )

is defined to be a sufficient statistic if and only if the conditional distribution of
, Xn ) .

S given T does not depend on θ for any statistic S = s ( X 1 , X 2 ,

Note: This definition is particularly useful in showing that a particular statistic is not sufficient. For instance,
to prove that a statistic T ′ = t ′ ( X 1 , X 2 ,
T = t ( X1 , X 2 ,

, X n ) is not sufficient, one needs only to find another statistic

, X n ) for which the conditional distribution of T given T ′ depends on θ . , X n be a random sample form the density f ( ⋅ ; θ ) . The

Jointly Sufficient Statistics: Let X1 , X 2 ,
statistics T1 ,
X1 , X 2 ,

, Tr are defined to be jointly sufficient if and only if the conditional distribution of , Tr = tr does not depend on θ .

, X n given T1 = t1 ,

Concept of Sufficient Statistic: A sufficient statistic is a particular kind of statistic. It is a statistic that
condenses Ω is such a way that no “information about θ ” is lost. The only information about the parameter
θ in the density f ( ⋅ ; θ ) from which we sampled is contained in the sample X 1 , X 2 ,
, X n ; so, when we say

that a statistic loses no information, we mean that it contains all the information about θ that is contained in the sample. We emphasize that the type of information of which we are speaking is that information about θ contained in the sample given that we know the form of the density; that is, we know the function f ( ⋅ ; ⋅) in
f ( ⋅ ; θ ) , and the parameter θ is the only unknown. We are not speaking of information in the sample that

might be useful in checking the validity of out assumption that the density does indeed have form f ( ⋅ ; ⋅) .

Example: Let X1 , X 2 , X 3 be a sample of size 3 from the Bernoulli distribution. Consider the two statistics
S = s ( X 1 , X 2 , X 3 ) = X 1 + X 2 + X 3 and T = t ( X 1 , X 2 , X 3 ) = X1 X 2 + X 3 . We have to show that s ( ⋅, ⋅, ⋅) is

sufficient and t ( ⋅, ⋅, ⋅) is not.

Values of
S

Values of
T

f ( x1 , x2 , x3 | S )

f ( x1 , x2 , x3 | T )

Estimation ~ 2 of 22

( 0, 0, 0 ) ( 0, 0, 1)
( 0, 1, 0 ) (1, 0, 0 )

0 1 1 1 2 2 2 3

0 1 0 0 1 1 1 2

1
1 1 1 3 3 3

1− p 1+ p 1− p 1+ 2 p
p 1+ p p 1+ p p 1+ 2 p p 1+ 2 p p 1+ 2 p

( 0, 1, 1)
(1, 0, 1) (1, 1, 0 ) (1, 1, 1)

1 1 1

3 3 3

1

1

Now, we have

f x1 , x2 , x3 |S =1 ( 0, 1, 0 |1) = P [ X1 = 0; X 2 = 1; X 3 = 0 | S = 1] = = P [ X 1 = 0; X 2 = 1; X 3 = 0; S = 1] P [ S = 1]

(1 − p ) p (1 − p )
⎛ 3⎞ 2 ⎜ ⎟ p (1 − p ) 1⎠ ⎝

=

1 3

and

f x1 , x2 , x3 |T = 0 ( 0, 1, 0 | 0 ) = = =

P [ X1 = 0; X 2 = 1; X 3 = 0; T = 0] P [T = 0]

(1 − p )2 p (1 − p )3 + 2 p (1 − p )2
p 1− p + 2 p = p 1+ p

The conditional distribution of the sample given the values of S is independent of p ; so S is a sufficient statistic. However, the conditional distribution of the sample given the values of T depends on p ; so T is not sufficient.

Factorization Theorem (Single Sufficient Statistic): Let X1 , X 2 ,

, X n be a random sample of size n , X n ) is

from the density f ( ⋅ ; θ ) , where the parameter θ may be a vector. A statistic T = t ( X1 , X 2 , sufficient if and only if the joint density of X 1 , X 2 ,
f x1 ,
, xn

, X n , which is
, xn }

∏ f ( xi ;θ ) , factors as
i =1

n

{ x1 ,

, xn ;θ } = g {t ( x1 , x2 ,

, xn ) ;θ } ⋅ h{ x1 , x2 , , xn }

= g {t ;θ } ⋅ h{ x1 , x2 ,

where the function h{ x1 , x2 ,
g {t ( x1 , x2 ,

, xn } is nonnegative and does not involve the parameter θ and the function , xn only through the function t ( ⋅, , ⋅) .
Estimation ~ 3 of 22

, xn ) ;θ } is nonnegative and depends on x1 ,

Factorization Theorem (Jointly Sufficient Statistics): Let X1 , X 2 ,
n

, X n be a random sample of size

from

the

density
, Xn ),

f (⋅ ; θ ) ,

where

the

parameter

θ

may

be

a

vector.

A

statistics

T1 = t1 ( X 1 , X 2 ,

, Tr = tr ( X 1 , X 2 ,
n

, X n ) is jointly sufficient if and only if the joint density of

X1 , X 2 ,

, X n , which is f x1 ,
, xn

∏ f ( xi ;θ ) , can be factored as
i =1

{ x1 ,

, xn ;θ } = g {t1 ( x1 , x2 , = g {t1 ,

, xn ) ,

, tr ( x1 , x2 , , xn }

, xn ) ;θ } ⋅ h{ x1 , x2 ,

, xn }

, tr ;θ } ⋅ h{ x1 , x2 ,

where the function h{ x1 , x2 ,
g {t1 ,

, xn } is nonnegative and does not involve the parameter θ and the function

, tr ;θ } is nonnegative and depends on x1 ,

, xn only through the function t1 ( ⋅,

, ⋅) ,

, tr ( ⋅,

, ⋅) .

N.B.: To get more about this topic, see Mood, Graybill, Boes; Introduction to the Theory of Statistics, P-300311.

Efficient Estimator: If x1 , x 2 ,

, x n be a sample drawn from the population with density f (x ;θ ) and t be

a unbiased consistent estimator of θ . If the variance of t is less than the variance of all other estimators, then
t is said to be the most efficient estimator of θ , simply called efficient estimator of θ . The efficiency of an

estimator can be written as
c= Var ( Most efficient estimator ) Var ( Given estimator )

Regular Distribution: The joint p.d . f . of X ’s is said to be regular with respect to its first θ derivative,
where X ~ f (x ;θ ) θ ∈ Ω i.e.,

⇒ ⇒ ⇒

∂f (x ;θ ) dx = 0 ∂θ −∞

−∞ ∞

∫ f (x ;θ )dx = 1

−∞ ∞

∫ ∂θ ⋅ f (x ;θ ) f (x ;θ )dx = 0 ∫

∂f (x ;θ )

∂ ln f (x ;θ ) f (x ;θ )dx = 0 ∂θ −∞ , X n be a random variables having the joint

This is called regular distribution.

Regular Estimator and Regularity Condition: Let X1 ,
p.d . f . fθ ( x1 , , xn ) , θ ∈ Θ . if the statistic t ( X 1 ,

, X n ) is such that fθ {t ( X 1 ,

, X n )} = ψ (θ ) ∀ θ , and if

the following regularity conditions hold then the statistic t ( X1 ,
θ ∈Θ .

, X n ) is known as the regular estimator of

i) ii) iii) iv)

θ lies in a non-degenerate open interval Θ in the real line; Θ may be infinite;
∂fθ ( x ) ∂θ

exists ∀ θ ∈ Θ ;

∫ fθ ( x ) dx can be differentiated with respect to θ under the integral sign; ∫ t ( x ) fθ ( x ) dx can be differentiated under the integral sign;
Estimation ~ 4 of 22

v)

⎡ ∂ ln fθ ( x ) ⎤ Eθ ⎢ ⎥ Exists and is positive ∀ θ ∈ Θ . ∂θ ⎣ ⎦

2

Best Regular Unbiased Estimator (BRUE): In any regular estimation case, the efficiency of an
unbiased regular estimator tn ( X1 ,
1 eθ ( tn ) =
, X n ) is

2 ⎧ ⎛ ∂ ln f ( X | θ ) ⎞ ⎫ ⎪ ⎪ nEθ ⎜ ⎨ ⎟ ⎬ ∂θ ⎝ ⎠ ⎪ ⎪ ⎩ ⎭ Varθ ( tn )

If eθ ( tn ) ≡ 1 , then tn is called efficient and a Best Regular Unbiased Estimator (BRUE).

Note: In any regular estimation case, 0 ≤ eθ ( tn ) ≤ 1 . We have eθ ( tn ) ≡ 1 iff Varθ ( tn ) achieves the lower bound for all θ . In any regular estimation case, the asymptotic efficiency of an unbiased regular estimator
tn ( X 1 , , X n ) is lim eθ ( tn ) .
n →∞

N.B.: From the Chapter Consistency and Efficient Estimator of Third Year Note, we have to read Example
of efficient and sufficient estimator, Fisher’s Information, Raw-Cramer Inequality and others.

Generalized Rao-Cramer Inequality: See in the chapter of Asymptotically Most Efficient Estimator of
Third Year Note. (Ref. Kendal, Stuart; the Advanced Theory of Statistics, P-12)

Bhattacharyya Inequality: See in the chapter of Asymptotically Most Efficient Estimator of Third Year
Note. (Ref. Kendal, Stuart; The Advanced Theory of Statistics, P-12-15)

Chapman, Robbins and Kiefer Inequality: This inequality gives a lower bound for the variance of an
estimate but does not require regularity conditions like Rao-Cramer Inequality.

Statement: Suppose that X = ( x1 , x2 ,

, xn ) be random variables with joint density or frequency function

f ( x ;θ ) , where θ is a one dimensional parameter belongs to parametric space Ω . Let T be an unbiased

estimate of τ (θ ) with Eθ T 2 < ∞ for all θ ∈ Ω . If θ ≠ ψ , assume that fθ and fψ are different and assume that there exists a ψ ∈ Ω such that θ ≠ ψ and S (θ ) = { fθ ( x ) > 0} ⊃ S (ψ ) = { fψ ( x ) > 0} then
Varθ (T ( X ) ) ≥ ⎡τ (ψ ) − τ (θ ) ⎤ ⎣ ⎦ ⎧ fψ ( X ) ⎫ {ψ :S (ψ )⊂ S (θ ), ψ ≠θ } ⎪ ⎪ Varθ ⎨ ⎬ ⎪ fθ ( X ) ⎪ ⎩ ⎭ sup
2

( )

∀ θ ∈Ω

Proof: Since T is unbiased for τ (θ ) , Eψ (T ( X ) ) = τ (ψ ) ∀ ψ ∈ Ω . Hence, for ψ ≠ θ
⎡ fψ ( x ) − fθ ( x ) ⎤ ∫ T ( x ) ⎢ fθ ( x ) ⎥ fθ ( x ) dx = τ (ψ ) − τ (θ ) ⎢ ⎥ ⎣ ⎦ S (θ )

Which gives

Estimation ~ 5 of 22

⎧ ⎪ Covθ ⎨T ( X ) , ⎪ ⎩ ⇒ Since ⇒ ⇒ ⇒ ⇒

⎡ fψ ( X ) ⎤ ⎫ ⎪ − 1⎥ ⎬ = τ (ψ ) − τ (θ ) ⎢ ⎢ fθ ( X ) ⎥ ⎪ ⎣ ⎦⎭ ⎧ fψ ( X ) ⎫ ⎪ ⎪ − 1⎬ = 0 Since E ⎨ fθ ( X ) ⎪ ⎪ ⎩ ⎭

⎧ fψ ( X ) ⎫ ⎪ ⎪ − 1⎬ = τ (ψ ) − τ (θ ) E {T ( X ) − τ (θ )} ⎨ fθ ( X ) ⎪ ⎪ ⎩ ⎭

ρ2 ≤1
⎧ ⎡ fψ ( X ) ⎤ ⎫ ⎧ fψ ( X ) ⎫ ⎪ ⎪ ⎪ 2⎪ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ − 1⎬ Covθ ⎨T ( X ) , ⎢ fθ ( X ) ⎦ ⎪ fθ ( X ) ⎭ ⎢ ⎥⎭ ⎪ ⎪ ⎪ ⎣ ⎩ ⎩ ⎧ 2⎪ Covθ ⎨T ( X ) , ⎪ ⎩ ⎡ fψ ( X ) ⎤ ⎫ ⎧ fψ ( X ) ⎫ ⎪ ⎪ ⎪ − 1⎥ ⎬ ≤ V {T ( X )} ⋅ V ⎨ ⎢ ⎬ ⎢ fθ ( X ) ⎥ ⎪ ⎪ fθ ( X ) ⎪ ⎣ ⎦⎭ ⎩ ⎭

⎧ fψ ( X ) ⎫ 2 ⎪ ⎪ ⎡τ (ψ ) − τ (θ ) ⎤ ≤ V {T ( X )} ⋅ V ⎨ ⎬ ⎣ ⎦ ⎪ fθ ( X ) ⎪ ⎩ ⎭ ⎡τ (ψ ) − τ (θ ) ⎤ ⎦ V {T ( X )} ≤ ⎣ ⎧ fψ ( X ) ⎫ ⎪ ⎪ V⎨ ⎬ ⎪ fθ ( X ) ⎪ ⎩ ⎭
2

( Proved )

Example: Let X be U [ 0, θ ] . Then
⎧1 ⎪ fθ ( x ) = ⎨θ ⎪0 ⎩
2

if 0 ≤ x ≤ θ otherwise

Thus we get Eθ ⎢

⎡ ∂ ln fθ ( X ) ⎤ θ2 1 . Hence we ⎥ = 2 , so that the lower bound of the Rao-Cramer inequality is ∂θ n θ ⎣ ⎦

can say that the regularity condition do not hold. Let ψ (θ ) = θ . If ψ < θ , then S (ψ ) ⊂ S (θ ) . Also,
⎡ fψ ( X ) ⎤ Eθ ⎢ ⎥ = ⎢ fθ ( X ) ⎥ ⎣ ⎦ =
2 2

∫0

ψ

⎛θ ⎞ 1 ⎜ ⎟ dx ⎝ψ ⎠ θ =

θ [ x ]ψ ψ2 0

θ ψ
2

Thus
⎡τ (ψ ) − τ (θ ) ⎤ ⎦ Varθ {T ( X )} ≥ sup ⎣ ⎧ fψ ( X ) ⎫ {ψ : ψ <θ } ⎪ ⎪ V⎨ ⎬ fθ ( X ) ⎪ ⎪ ⎩ ⎭ ≥ sup ≥ sup
{ψ : ψ <θ } θ − 1

[ψ − θ ]2
ψ

{ψ : ψ <θ }

{ψ (θ −ψ )}
iff ψ <

Now, let us take
K (ψ ) =

ψ (θ −ψ ) >1 (ψ − 1)(θ −ψ + 1)

θ +1
2

Therefore, K (ψ ) increases as long as ψ <
ψ = θ +1
2

θ +1
2

and decreases if ψ >

θ +1
2

. K (ψ ) attains maximum value if

.

Estimation ~ 6 of 22

∴ So,

Varθ {T ( X )} ≥ sup Varθ {T ( X )} ≥

{ψ : ψ <θ }

{ψ (θ −ψ )}

⎧θ + 1 ⎛ θ + 1 ⎞⎫ =⎨ ⎜θ − ⎟⎬ 2 ⎝ 2 ⎠⎭ ⎩

θ2
4

This is the lower bound for any unbiased estimate T ( X ) of θ . Now, X is a complete sufficient statistic and 2 X is unbiased for θ so that T ( X ) = 2 X is the UMVUE. Also
Varθ {2 X } = 4Var ( X ) =

θ2
3

>

θ2
4

Thus the lower bound unbiased estimate of θ .

θ

2

4

of the Chapman, Robbins and Kiefer (CRK) inequality is not achieved by any

Example: Let X have p.m. f .
⎧1 ⎪ PN { X = k} = ⎨ N ⎪0 ⎩ ; ; if k = 1, 2, Otherwise ,N

Let Ω = { N : N ≥ M , M > 1 given} . Takingψ ( N ) = N . The p.m. f . does not hold regularity conditions, so CRK inequality is applicable. Now for N ≠ N ′ ∈ Ω
S ( N ) = {1, 2, , N } ⊃ S ( N ′ ) = {1, 2, , N ′} ; if N ′ < N

Also, PN and PN ′ are different for N ≠ N ′ . Thus
VarN (T ) ≥ sup

( N − N ′ )2
⎧P ⎫ VarN ⎨ N ′ ⎬ ⎩ PN ⎭

N ′< N

Now,

⎧N PN ′ ( x ) ⎪ PN ′ = ⎨N′ ( x) = PN PN ( x ) ⎪ ⎩0
⎧ P ′ ( x) ⎫ 1 ⎪ ⎪ EN ⎨ N ⎬= ⎪ PN ( x ) ⎪ N ⎩ ⎭

; ;

x = 1, 2, Otherwise
and

, N ′, N ′ < N

1

N′

⎛N ⎞ ⎜ ′⎟ =1 ⎝N ⎠

⎧ P ′ ( x) ⎫ 1 ⎪ ⎪ EN ⎨ N ⎬ = N ⎪ PN ( x ) ⎪ ⎩ ⎭

2

∑⎜ N′ ⎟ ⎝ ⎠
1

N′

⎛N ⎞

2

=

N N′

⎧ P ′ ( x) ⎫ N ⎪ ⎪ ∴ VarN ⎨ N −1 > 0 ⎬= ⎪ PN ( x ) ⎪ N ′ ⎩ ⎭

for N > N ′

It follows that
VarN (T ) ≥ sup

( N − N ′ )2

N −1 N′ ≥ sup ⎡ N ′ ( N − N ′ ) ⎤ ⎣ ⎦
N ′< N N ′< N

Now, let us take

( k − 1)( N − k + 1)

k (N − k)

>1

iff

k<

N +1 2

Therefore, N ′ ( N − N ′ ) increases as long as N ′ < at N ′ =
N +1 . 2

N +1 N +1 and decreases if N ′ > . The maximum is achieved 2 2

Estimation ~ 7 of 22

⎡ N +1⎛ N + 1 ⎞⎤ VarN (T ) ≥ ⎢ ⎜N − ⎟ 2 ⎠⎥ ⎣ 2 ⎝ ⎦ ≥ M (N − M ) if M > N +1 2

N.B.: Reference Rohatgi V K; An Introduction to Probability Theory and Mathematical Statistics, p-365. And Rohatgi, Saleh; an Introduction to Probability and Statistics, p-397.

Uniformly Minimum Variance Unbiased Estimator (UMVUE): Let X1 , X 2 ,
sample from f ( x, θ ) . An estimator T * = t * ( X1 , variance unbiased estimator of τ (θ ) if and only if a) T * is unbiased for τ (θ ) b) For any other estimator T of τ (θ ) will be V T * ≤ V (T ) for all θ ∈ Ω .

, X n be a random

, X n ) of τ (θ ) is defined to be a uniformly minimum

( )

Concept of Raw-Blackwell Theorem: A very powerful method for finding minimum variance estimator
irrespective whether MVB is attained or not is provided by a theorem known as Rao-Blackwell Theorem. This theorem says that if we look for an MVE of τ (θ ) , we need only inspect estimators which are function of sufficient statistic. This theorem says that any unbiased estimator should be a function of sufficient statistic. If not we can construct an estimator with smaller variance by taking the conditional expectation given a sufficient statistic. However, this raises the question of which sufficient statistic to use to compute the conditional expectation. For example, suppose that S is an unbiased estimator of τ (θ ) , with finite variance and let T1 & T2 are both sufficient statistic for θ , with T2 = h (T1 ) for some function of h . Let us define
* S1 = E ( S | T1 )

and

* S2 = E ( S | T2 )

* * By the Rao-Blackwell theorem the variance of S1 & S 2 can not exceed V ( S ) . However, it is not obvious

which estimator will have the smaller variance.

Statement of Rao-Blackwell Theorem: Suppose that S has a joint distribution depending on some
unknown parameter θ and that T = T ( x ) is a sufficient statistic for θ . Let S = S ( x ) be a statistic such that
E ( S ) = τ (θ ) and if S * = E ( S | T ) then

a) S * is a unbiased estimator of τ (θ ) b) V S * ≤ V ( S ) . Moreover, V S * < V ( S ) unless P S = S * = 1 .

( )

( )

(

)

N.B: For more on see the Chapter of Asymptotically Most Efficient Estimator of Third Year Note.

Minimal Sufficient Statistics: Suppose X1 , X 2 ,
then joint sufficient statistic are
x and s 2 .

, Xn ~ N θ, σ 2

(

)

where both θ and σ 2 are unknown, set of sufficient statistic

We

have

another

X1 , X 2 ,

, X n ⇒ Y1 < Y2 <

< Yn . Now these sufficient statistic condense the data.
Estimation ~ 8 of 22

A set of sufficient statistic is minimal if no other set of sufficient statistic condenses the data more. A set of jointly sufficient statistic is define to be minimal sufficient iff it is a function of every other set of sufficient statistic. That is among a number of sufficient statistic we should choose one t0 (say) which condenses the data more than any other sufficient statistics. Then t0 is minimal sufficient statistic. A statistic T ( x1 , x2 ,
, xn ) is a minimal sufficient statistic if T ( ⋅) is a sufficient statistic and a function of

every other sufficient statistic that is
T ( x1 , x2 , , xn ) = ψ {t ( x1 , x2 , , xn )} ∀ sufficient statistic t

Sufficient statistic always exists but minimal sufficient statistic may not always exists.

Way of Finding Minimal Sufficient Statistic: Let f ( x ; θ ) be the p.d . f . of X and suppose that there
exists a function T ( x ) such that for any two points x and y the ratio
t ( x ) = t ( y ) then T ( x ) is a minimal sufficient statistic for θ . f (y ;θ) f (x ; θ )

is independent of θ iff

If likelihood ratio
L(x ; θ )

L( y ; θ )

L(x ; θ )

is independent of θ when t ( x ) = t ( y ) for some sufficient statistic t ( ⋅) and then T ( x ) is a minimal sufficient statistic for θ .
, xn are independent random variables each with N θ , σ 2 , where θ is

L( y ; θ )

=

g (t ; θ ) h ( y )

g (t ; θ ) h ( x )

=

h( y)

h ( x)

Example: Suppose x1 , x2 ,

(

)

unknown and σ 2 is known. Find a minimal sufficient statistic for θ .

Solution: Here we have,
⎛ 1 ⎞ − 2σ 2 ∑ ( xi −θ ) L(x |θ ) = ⎜ ⎟ e ⎝ σ 2π ⎠
n n 1
2

1

2

⎛ 1 ⎞ − 2σ 2 ∑ xi − 2σ 2 ( −2θ ∑ xi + nθ =⎜ e ⎟ e ⎝ σ 2π ⎠
1

2

)

By Neyman Factorization theorem we can say that x is sufficient for θ . Similarly
⎛ 1 ⎞ − 2σ 2 ∑ yi − 2σ 2 ( −2θ ∑ yi + nθ L( y |θ ) = ⎜ e ⎟ e ⎝ σ 2π ⎠
n 1
2

1

2

)

L( y |θ )

L(x |θ )

=e

1 2σ 2

( ∑ x −∑ y )
2 i 2 i

⋅e

θ ( ∑ xi − ∑ yi ) σ2

Which will be independent of θ iff

∑ xi = ∑ yi

t ( xi ) = t ( yi )

Hence t ( x1 , x2 ,

, xn ) =

∑ xi
1

is a minimal sufficient statistic for θ .

Example: Suppose we have n = 2 independent observation from the Cauchy distribution with p.d . f .
f X ( x) =

1

π 1 + ( x − θ )2

;

−∞ < x < ∞

Show that no nontrivial sufficient statistic exists.
Estimation ~ 9 of 22

Solution: Since n = 2 , we consider two points x = ( x1 , x2 ) and y = ( y1 , y2 ) . Hence we get,
L(x |θ ) = Similarly, L( y |θ ) = 1

π π

2

⋅ ⋅

1
2

1
2

1 + ( x1 − θ ) 1 + ( x2 − θ ) 1
2

1
2

1
2

1 + ( y1 − θ ) 1 + ( y2 − θ )

Thus we have,

L( y |θ )

L(x |θ )

{1 + ( x − θ ) }{1 + ( x =
2 1

1

2

−θ )

2

}

{1 + ( y − θ ) }{1 + ( y − θ ) } {1 + ( y − θ ) }{1 + ( y − θ ) } = {1 + ( x − θ ) }{1 + ( x − θ ) }
2 2 1 2 2 2 2 1 1 2 2 2

1

Since

L( y |θ )

L(x |θ )

is depends on θ , so hence we cannot get the minimal sufficient statistic.

Example: Let X be a single observation from the point probability function
⎧θ ⎪2 ⎪ ⎪θ ⎪3 ⎪ ⎪1 − 2θ f (x |θ ) = ⎨ ⎪ 3 ⎪ 2 θ ⎪θ + 6 ⎪ ⎪θ − θ 2 ⎪ ⎩ if if if if if x = −3 x=0 x = 6, 13, 52 x = 60 x = 68 where 0 < θ < 1 2

Find a minimal Sufficient statistic for θ .

Solution: We know at least one sufficient statistic always exists, namely the identity statistic
t ( x1 , x2 ,

, xn ) = x1 , x2 ,

, xn . Hence X itself is a sufficient statistic for θ . For finding minimal Sufficient

statistic,

L( y |θ )

L(x |θ )

is independent of θ . So there will exists a minimal sufficient statistic.

Thus we can partition the sample space into the sets {−3, 0} , {6, 13, 52} , {60} , {68} and a minimal sufficient statistic is
⎧c1 ⎪c ⎪ 2 t ( x) = ⎨ ⎪c3 ⎪c4 ⎩ if if if if x = −3 or 0 x = 6 or 13 or 52 x = 60 x = 68

where c1 , c2 , c3 and c4 are distinct constants. The probability distribution of t ( x ) is
⎧ 5θ ⎪6 ⎪ ⎪1 − 2θ P {t ( x = w )} = ⎨ ⎪θ 2 + θ ⎪ 6 ⎪ 2 ⎩θ − θ if if if if w = c1 w = c2 w = c3 w = c4
Estimation ~ 10 of 22

Example: Let X be a single observation from the probability function
⎧θ 2 ⎪ ⎪1 θ 2 ⎪2 − 2 ⎪ P(x |θ ) = ⎨ θ 2 θ + ⎪− ⎪ 2 2 ⎪1 θ 2 −θ ⎪ − ⎩2 2 if if if if x = −1, 3 x=0 x = 2, 4 x =1

where θ is an unknown number between zero and

2 − 1 . Find a minimal sufficient statistic for θ .

Solution: Here we partition the sample space into the sets {−1, 3} , {0} , {2, 4} , {1} . Hence the minimal
sufficient statistic is
⎧c1 ⎪c ⎪ 2 t ( x) = ⎨ ⎪c3 ⎪c4 ⎩ if if if if x = −1 or 3 x=0 x = 2 or 4 x =1

Where c1 , c2 , c3 , c4 are distinct constants.

Best Asymptotically Normal Estimator (BAN Estimator): A sequence of estimator T1′, T2′,

, Tn′ of

τ (θ ) is defined to be best asymptotically normal ( BAN ) if and only if the fallowing four conditions are

satisfied: a) b) c) The Distribution of
n ⎡Tn′ − τ (θ ) ⎤ approaches N 0, σ ′2 (θ ) as n → ∞ ⎣ ⎦
for each θ ∈ Ω

{

}

⎤ For every ε > 0 , lim P ⎣ Tn′ − τ (θ ) > ε ⎦ = 0 θ ⎡ n →∞

Let {Tn } be any other sequence of simple consistent estimators for which the distribution of
n ⎡Tn′ − τ (θ ) ⎤ approaches N ⎡0, σ 2 (θ ) ⎤ ⎣ ⎦ ⎣ ⎦

d)

σ 2 (θ ) is not less than σ ′2 (θ ) for all θ in any open interval.

BAN is sometimes replaced by consistent asymptotically normal efficient (CANE).

Example: Let x1 , x2 ,

, xn be a random sample from N µ , θ 2 . Then Tn′ =

(

)

∑ xi = xn
n

is a BAN estimator of

µ . Since the limiting distribution of

n [ xn − µ ] is N 0, σ 2 and no other estimator can have smaller limiting

(

)

variance is any interval of µ values.

Best Consistent Unbiased Asymptotically Normal (BCUAN): A CAN estimator TN of g (θ ) is said
to be the Best Consistent Unbiased Asymptotically Normal Estimator if it is unbiased and the variance of the limiting distribution of
n ⎡Tn − g (θ ) ⎤ has least possible value. ⎣ ⎦

Completeness: The family of density or probability functions f ( x | θ ) , θ ∈ Ω , is called complete if, for
every function u ( x ) , the identity Eθ {u ( X )} = 0 implies P {u ( X ) = 0} = 1 for all θ ∈ Ω . This is sometimes θ express by saying that there are no unbiased estimators of zero. In particular it means that two different function of T can not have the same expected value. For exmple
Estimation ~ 11 of 22

E {T ( X )} = θ ∴ ⇒ E {T ( X ) − K ( X )} = 0 T (X )− K (X ) = 0

and

E { K ( X )} = θ

That is any unbiased estimator is unique. In this sense, we are primarily interested in knowing that the family of density function of a sufficient statistic is complete, since in that case an unbiased function of the sufficient statistic will be unique and it must be a uniformly minimum variance unbiased estimator by the RaoBlackwell theorem.
x2 2

Example: Suppose f ( x ) =

1 2π

e

; − ∞ < x < ∞ . Check it whether it is complete or not.

Solution: Let us consider the function ϕ ( x ) = x . Now,
E {ϕ ( x )} = E ( x ) = = 1 2π 1 2π 1 2π

−∞ 0

∫ x⋅e

x2 2 dx x2 2 dx + ∞ 0 x2 2 dx

−∞

∫ −x ⋅ e

1 2π 1 2π

∫ x⋅e
x⋅e

=− ∴ E {ϕ ( x )} = 0

0

x⋅e

x2 2 dx +

0

x2 2 dx

But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.

Example: Let f ( x ) =

; − ∞ < x < ∞ ; β > 0 . Check it whether it is complete or not. e β 2π Solution: Let us consider the function ϕ ( x ) = x . Now, E {ϕ ( x )} = E ( x ) = = 1

1

x2 2β 2

β 2π
1

−∞

∫ x⋅e

x2 2β 2

dx 1
∞ − x2

β 2π
1

−∞

∫ −x ⋅ e

0

x2

2β 2

dx +

β 2π
1

∫ x⋅e
0

2β 2

dx

=− ∴ E {ϕ ( x )} = 0

β 2π

∫ x⋅e
0

x2

2β 2

dx +

β 2π

∫ x⋅e
0

x2

2β 2

dx

But ϕ ( x ) is non-zero. Hence f ( x ) is not complete.

Example: Let x1 , x2 ,

, xn be independent random variable each with P ( λ ) ; λ > 0 . Find a minimal

sufficient statistic for λ and check if it is complete.

Solution: The joint distribution of x1 , x2 ,
L(x ; θ ) = e− nλ λ ∑
n i =1 xi

, xn is

⎛ n 1 ⎞ x = ⎜ ∏ ⎟ e − nλ λ ∑ i ⎜ xi ! ⎟ ∏ xi ! ⎝ i =1 ⎠

(

)
is a sufficient statistic. Now,

By the Neyman Factorization theorem, we can say that

∑ xi

Estimation ~ 12 of 22

e − nλ λ ∑ L( y | λ) L(x | λ) = e

xi

∏ xi !
− nλ y λ∑ i i =1 n

n

∏ yi !
i =1

which is independent of λ if

∑ xi = ∑ yi . So that ∑ xi

is a minimal sufficient statistic. The distribution of

∑ xi

is also Poisson and has a p.m. f .
e− nλ ( nλ ) x!
x

;

x = 0, 1, 2,

Hence showing that the family of probability functions of the minimal sufficient statistic equivalent to showing that the Poisson family is complete. Let u ( ⋅) be any function, then
Eλ ⎡u ( x ) ⎤ = 0 ⎣ ⎦ ⇒ ⇒
k =0 ∞ k =0

∑ xi

is complete is

∑ u (k ) ⋅ ∑ u (k ) ⋅

e−λ λ k =0 k!

λk
k!

=0

⎡ Since e − λ > 0 ⎤ ⎣ ⎦

Since e

−λ

≠ 0, k ! ≠ 0 , so that u ( k ) = 0 . Hence the Poisson family of distribution is complete and

∑ xi

is a

complete minimal sufficient statistic for λ .

Example: Let x1 , x2 , Solution: Since x1 , x2 ,

, xn be independent random variable each with U ( 0, θ ) ; θ > 0 . Find a sufficient

statistic for θ and show that it is complete.
, xn be independent random variables each with U ( 0, θ ) ; θ > 0 then

f ( x) = ∴ L ( x) =

1

θ
1

θn Hence, x( n ) is the sufficient statistic for θ . We known that
f n:n ( x ) = n { F ( x )}
n −1

⋅ f ( x)

So, the p.d . f . of x( n ) is
nx n −1 ⎡ ⎢∵ ⎢ ⎣ F ( x) =

θ

n

∫ θ dx = θ ⎥ ⎥
0

x

1

x⎤ ⎦

Let u ( ⋅) be any function. Then

Estimation ~ 13 of 22

Eθ ⎡u ( xn ) ⎤ = 0 ⎣ ⎦
θ

⇒ ⇒ ⇒ ∴

∫ u ( xn ) ⋅
0 0

nx n −1

θn
n −1

dx = 0

θ

∫ u ( xn ) x

dx = 0

u (θ )θ n −1 = 0 u (θ ) = 0 ∀θ

[ Differentiating both sides w. r. to θ ]
⎡ Since θ n −1 ≠ 0 ⎤ ⎣ ⎦ , X n ) is called first order ancillary if Eθ {U ( X1 , X 2 , , X n )} is

Therefore x( n ) is a complete sufficient statistic for θ .

Ancillary Statistic: A statistic U ( X1 , X 2 ,
a constant independent of θ . U ( X1 , X 2 , function of U ( X1 , X 2 ,

, X n ) is called an ancillary statistic for θ if the distribution

, X n ) does not depend on θ .

Thus, unlike a sufficient statistic, an ancillary statistic does not contain any information about the parameter
θ . In such cases, intuition suggests that (since the sufficient statistic T ( X1 , X 2 ,
, X n ) contains all the

information about θ ) the ancillary statistic should be independent of T ( X 1 , X 2 ,

, Xn ) .

Example:

Let
n

X1 , X 2 ,

, Xn
2

be

a

random

sample

from

N ( µ , 1) .

Then

the

statistic

U ( X ) = ( n − 1) S 2 = ∑ ( X i − X )
i =1

is ancillary since ( n − 1) S 2 ~ χ(2n −1) , which is free of µ . Some other ancillary

statistics are X1 − X , X ( n ) − X (1) and

∑ Xi − X
i =1

n

.

Example: Let X1 , X 2 ,

, X n be a random sample from N 0, σ 2 . Then the statistic U ( X ) = X follows

(

)

N 0, n −1σ 2 and is not ancillary with respect to the parameter σ 2 .

(

)

Example: Let X (1) , X ( 2) , Example: Let X1 , X 2 ,
f ( x ; µ ,θ ) =

, X ( n ) be the order statistics of a random sample from the p.d . f . f ( x − θ ) , where

θ ∈ ℜ , then the statistic U ( X ) = X ( 2 ) − X (1) ,

(

, X ( n ) − X (1)

) is ancillary for θ .

, X n be a iid random variable with distribution

1 ; µ −θ ≤ x ≤ µ +θ 2θ Then the statistic R = X ( n ) − X (1) is not ancillary statistic because the distribution of R is

fR ( r ) =

n ( n − 1) x n − 2 ⎛ x ⎞ ⎜1 − ⎟ n −1 2θ ⎠ ⎝ ( 2θ )

;

0 ≤ x ≤ 2θ

which is dependent of θ .

Basu’s Theorem: Let T ( X1 , X 2 ,

, X n ) be a complete sufficient statistic and U ( X 1 , X 2 ,

, X n ) an

ancillary statistic. Then T and U are independent random variables.

Proof: Fixing u as an arbitrary value of U . Let g ( t ) = P {U = u | T = t} , then
Estimation ~ 14 of 22

Eθ { g (T )} = ∑ P {U = u | T = t} P {T = t} =

∑ P {U = u, T = t}
t

t

P {T = t}

P {T = t}

= ∑ P {U = u , T = t} = P {U = u}
t

So, Eθ ⎡ g (T ) − P {U = u}⎤ = 0 . By completeness of T , g ( t ) − P {U = u} = 0 for all t , that is ⎣ ⎦
P {U = u | T = t} = P {U = u} for all t

Hence U and T are independent random variables.

Example: Let X1 & X 2 be independent random variables each N ( µ , σ 2 ) with σ 2 known and µ unknown.
Let T ( X1 , X 2 ) = X 1 + X 2 , U ( X 1 , X 2 ) = X1 − X 2 . Then U ( X1 , X 2 ) is N 0, 2σ 2 and its distribution does not depend on µ . Hence it is an ancillary statistic for µ . Since T ( X 1 , X 2 ) is a complete sufficient statistic, it follows from the Basu’s theorem the X 1 − X 2 and X 1 + X 2 are independent random variables.

(

)

Example: Let X1 , X 2 ,
Y1 < Y2 <

, X n be a random sample of size n from the uniform distribution on [ 0, θ ] , and let
Yn

< Yn denote the corresponding order statistics. Show that Y1

and Yn are independent random

variables.

Solution: Since Yn is a complete sufficient statistic for θ , it suffices (by the Basu’s Theorem) to show that
the distribution of Y1 Y does not depend on θ (i.e. that Y1 Y is an ancillary statistic), which follows since for n n
0 < t ≤1
fY1

Yn

( t ) = P ⎛ Y1 Y ⎜

n

≤ t ⎞ = P (Y1 ≤ tYn ) ⎟ ⎠ = P (Y1 ≤ ty ) fYn ( y ) dy
θ

0

⎧ ⎛ ty ⎞ n ⎫ ny n −1 ⎪ ⎪ = ⎨1 − ⎜ 1 − ⎟ ⎬ n dy ⎝ θ⎠ ⎪ θ ⎩ ⎭ 0⎪

θ

∫ ∫
0

θ

=

ny n −1

θ

n

y⎞ ⎛ y⎞ ⎛ dy − ⎜1 − t ⎟ n ⎜ ⎟ θ ⎠ ⎝θ ⎠ ⎝ 0

θ

n

n −1

1

θ

dy y⎞ ⎛ ⎜ Taking x = ⎟ θ⎠ ⎝

= 1 − n (1 − tx ) x n −1dx

0

1

n

Which is independent of θ . So Y1 Y and Yn are independent random variables. n

Theorem: Suppose that X = ( X1 , X 2 ,
parameters exponential family

, X n ) have joint density or joint frequency function that is a k -

⎡ k ⎤ f ( x ; θ ) = exp ⎢ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ⎣ i =1 ⎦

then the statistic {T1 ( x ) , T2 ( x ) ,

, Tk ( x )} is complete as well as sufficient for θ .
Estimation ~ 15 of 22

Example:

Show

that

if

X1 , X 2 ,

, Xn

are

independent

random

variables

each

N µ , σ 2 ; − ∞ < µ < ∞ , σ 2 > 0 are both unknown, then the joint density of X 1 , X 2 ,

(

)

, X n is a member of

two parameter exponential family.

Solution: The joint density of X1 , X 2 ,

, X n is
2

⎛ 1 ⎞ − 2σ 2 ∑ ( xi − µ ) L x ; µ, σ 2 = ⎜ ⎟ e ⎝ σ 2π ⎠

(

)

n

1

⎛ 1 ⎞ − 2σ 2 {∑ xi − 2 µ ∑ xi + nµ } =⎜ ⎟ e ⎝ σ 2π ⎠ 1 n 1 ⎡ ⎤ = exp ⎢ n ln − ln ( 2π ) − 2 ∑ xi2 − 2 µ ∑ xi + nµ 2 ⎥ σ 2 2σ ⎣ ⎦ 2⎤ ⎡ µ 1 nµ = exp ⎢ − n ln σ 2π − 2 ∑ xi2 + 2 ∑ xi − 2 ⎥ σ 2σ 2σ ⎦ ⎣
n 1
2 2

{

}

(

)

⎡µ = exp ⎢ 2 ⎢σ ⎣

⎛ 1 ∑ xi + ∑ xi2 ⎜ − 2σ 2 ⎝

2 ⎪ ⎞ ⎧ nµ − ⎨ 2 + n ln σ 2π ⎟ ⎠ ⎪ 2σ ⎩

(

⎪ )⎬⎥ ⎪⎥ ⎭⎦

⎫⎤

(1)
, X n is said to be a

So we can say, there is sufficient statistic for µ and σ 2 . The joint density of X 1 , X 2 , Darmois form if
⎡ k ⎤ L ( x ; θ ) = exp ⎢ ∑ Ci (θ ) Ti ( x ) − d (θ ) + S ( x ) ⎥ ⎣ i =1 ⎦

member of the exponential family or a member of the Koopman-Darmois class or is said to have Koopman-

( 2)

Now by (1) and ( 2 ) we have,
T1 ( x ) = ∑ xi C1 µ , σ 2 = d µ, σ 2 T2 ( x ) = ∑ xi2 C2 µ , σ 2 = − S ( x) = 0

(

)

(

)

µ σ2 ⎧ nµ 2 ⎫ ⎪ ⎪ = ⎨ 2 + n ln σ 2π ⎬ ⎪ 2σ ⎪ ⎩ ⎭

(

)

1 2σ 2

(

)

So, the joint density of X 1 , X 2 ,

, X n is a member of two parameter exponential family.

Lehmann-Scheffe Theorem: This theorem gives a simple criterion for existence of a uniformly minimum
variance unbiased estimator when a complete and sufficient statistic exists.

Statement: Let X1 ,

, X n be a random sample from a density f ( ⋅ , θ ) . If S = s ( X1 ,

, X n ) is a complete

sufficient statistic and if T * = t * ( S ) , a function of S , is an unbiased estimator of τ (θ ) , then T * is an UMVUE of τ (θ ) .

Proof: Let T ′ be any unbiased estimator of τ (θ ) which is a function of S ; that is, T ′ = t ′ ( S ) . Then
⎡ ⎤ ⎡ ⎤ Eθ ⎣T * − T ′⎦ = 0 for all θ ∈ Ω and T * − T ′ is a function of S ; so by completeness of S , P ⎣t * ( S ) = t ′ ( S ) ⎦ ≡ 1 θ

for all θ ∈ Ω . Hence there is only one unbiased estimator of τ (θ ) . T * must be equal to E [T | S ] since

Estimation ~ 16 of 22

E [T | S ] is an unbiased estimator of τ (θ ) depending on S . By Rao-Blackwell theorem, Vθ ⎡T * ⎤ ≤ Vθ [T ] for ⎣ ⎦

all θ ∈ Ω ; so T * is an UMVUE.

Explanation: This theorem states that if a complete sufficient statistic S exists and if there is an unbiased
estimator for τ (θ ) , then there is an UMVUE for τ (θ ) . This theorem also simplifies search for unbiased estimator if a complete and sufficient statistic T exist and there exist no function h such that
E ⎡ h ( s ) ⎤ = τ (θ ) , then no unbiased estimator of τ (θ ) exist. The Rao-Blackwell theorem and Lehmann-Scheffe ⎣ ⎦

theorem suggest two approaches to finding UMVUE when a complete and sufficient statistic exists.

Note:
a) Find a function h such that E ⎡ h ( s ) ⎤ = τ (θ ) then h ( s ) is the unique UMVUE of τ (θ ) . The ⎣ ⎦

function h can be determined by solving the equation E ⎡ h ( s ) ⎤ = τ (θ ) . ⎣ ⎦ b) Given an unbiased estimator T of τ (θ ) defined an estimator by the Rao-Blackwell theorem

E ⎡T * | S ⎤ = T then this T is the unique UMVUE of τ (θ ) . ⎣ ⎦

Example: Let X1 ,
T = X1 + X 2 +

, X n be iid Bernoulli random variable with parameter θ . By factorization theorem

+ X n is sufficient for θ . And it is one parameter exponential family of distribution so it is

complete. We want to find the UMVUE of θ 2 . Let n = 2 . If a UMVUE exists, it is a function of the form h ( S ) , where the function h satisfies
θ 2 = ∑ h ( k ) ⎜ ⎟ θ k (1 − θ ) k
k =0 2

⎛2⎞ ⎝ ⎠

2− k

θ = h ( 0 )(1 − θ ) + h (1) θ (1 − θ ) + h ( 2 )θ 2
2 2

(1)

For the equation (1) , L.H .S = R.H .S iff h ( 0 ) = h (1) = 0 and h ( 2 ) = 1 . Thus h ( S ) =
S ( S − 1)
2

is UMVUE of θ 2 if h = 2 . But for n > 2 , T * = I ( X1 + X 2 = 2 ) .

Note: UMVUE of τ (θ ) can be found if a complete and sufficient statistic exists. However in many cases we
can not find complete and sufficient statistic. So in that case we can not apply the Lehmenn-Scheffe theorem to find UMVUE estimators. So, crammer-Rao lower bound derives, a lower bound for the variance of a unbiased estimator of τ (θ ) . If the variance of some unbiased estimator achieves this lower bound, then the estimator will be UMVUE.

Goodness of the Estimator: Modal Unbiased and Median Unbiased Estimators: Modal Unbiased Estimate: Let X i ( i = 1,
and let t ( X1 ,
, n ) be iid random variables with common p.d . f . f ( x ; θ )

, X n ) be a statistic such that the mode of the density function of t is θ . Then t ( X1 ,

, X n ) is

said to be a model unbiased estimate of θ .
Estimation ~ 17 of 22

Median Unbiased Estimate: Let X i ( i = 1,
and let t ( X1 ,

, n ) be iid random variables with common p.d . f . f ( x ; θ )

, X n ) be a statistic such that the median of the density function of t is θ . Then t ( X1 ,

, Xn )

is said to be a median unbiased estimate of θ .

Example: Suppose X i ( i = 1,
f ( x, θ ) = 1

, n ) be the random variable with common p.d . f .
x

θ

e

θ

;

x > 0, θ > 0

Find median unbiased estimate.

Solution:
We get , 1 − F ( x ) = ∫ e θ dx
0 x x

θ

Let , ∴

⎡ −x ⎤ = − ⎢e θ ⎥ ⎢ ⎥ ⎣ ⎦0 Y1 = min X i
1≤i ≤ 2 n +1

x

= 1− e

x

θ

fY1 ( y1 ) = n ⎡1 − F ( x ) ⎤ ⎣ ⎦

n −1

f ( x)
2 n +1−1

x ⎡ − ⎤ = ( 2n + 1) ⎢1 − 1 + e θ ⎥ ⎢ ⎥ ⎣ ⎦

1

x

θ

e

θ

=

( 2n + 1)
θ

e

x

θ

( 2 n −1)

;
− x

0≤ x≤∞
1 2
m

m

We can write

0

( 2n + 1)
θ θ

e

θ

( 2 n −1)

dx =

( 2n + 1)
−e
− m

x 1 −θ ⎡ − θ ( 2 n −1) ⎤ ⎢e ⎥ = ⋅ 2 ( 2n + 1) ⎢ ⎥ ⎣ ⎦0

⇒ ⇒ ∴

θ

( 2 n −1)

+1 = ln 2

m=

( 2n + 1)

θ

1 2

⎡ θ ⎤ ln 2 ⎥ = θ E⎢ ⎢ ( 2n + 1) ⎥ ⎣ ⎦

Thus,

( 2n + 1)
ln 2

y1 is the estimate of θ . So,

( 2n + 1)
ln 2
⎛ ⎝

y1 is the unbiased estimate of median?

Example: Suppose X ~ N (θ , σ 2 ) and X ~ N ⎜ θ , ⎜ Solution: We have
f ( x) = 1
1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠
2

σ2 ⎞

⎟ . Then the model X is unbiased estimate of θ . n ⎟ ⎠ ; −∞ < x < ∞

σ 2π

⇒ ⇒

⎛ 1 ⎞ 1 ⎛ x −θ ⎞ ln L (θ ) = f ( x ) = n ln ⎜ ⎟ − ∑⎜ ⎟ ⎝ σ 2π ⎠ 2 ⎝ σ ⎠ ∂ ln L (θ ) 1 ⎛ x −θ ⎞ = 0 − ⋅ 2∑ ⎜ ⎟ ( −1) = 0 2 ∂θ ⎝ σ ⎠ θ=X

2

Estimation ~ 18 of 22

It can be shown that the second derivative is negative. So, X is the modal value of θ . Again, X is also an unbiased estimate of θ . Thus we can say that the X is modal unbiased estimate of θ .

Theorem: If m is a median of a discrete density p ( x ) and g ( r ) = E ( X − r ) = ∑ X − r P ( x ) then g ( r ) is
x

minimized for r = m provided that the sum exists for at least one r ∈ R (The real line).

Proof: Let r0 ∈ R and r0 ∈ [ m1 , m2 ] . Now, if it can be shown that E ( X − r0 ) = E ( X − m ) > 0 , it can be
concluded that g ( r ) is minimum for r = m . Expanding we get,
x≤m

∑ ( r0 − m ) P ( x ) + ∑ ( r0 + m − 2m ) P ( x ) + ∑ ( m − r0 ) P ( x )
m < x < r0 x > r0

x≤m

∑ ( r0 − m ) P ( x ) ≥ 2 ( r0 − m )

1

From the definition of the median we get,
x > r0

∑ ( m − r0 ) P ( x ) = ∑ ( m − r0 ) P ( x ) − ∑ ( m − r0 ) P ( x )
x>m m < x < r0

E ( X − r0 ) − E ( X − m ) ≤ 0

Which will be minimized when r0 = m .

Example: Let X i ( i = 1,
p.d . f . f ( x, θ ) =
1

, n ) be independent random samples from a continuous uniform distribution with

θ

; 0 < x ≤ θ . We have to show that y is a model unbiased estimate of θ .

Consider Y = max X i . The density function of Y is
i

φ(y ;θ) =

ny n −1

θn

;

0 < y ≤θ

The mode of this distribution is θ . Hence, y is a modal unbiased estimate of θ .

Show that a modal unbiased estimate may not always unique.
Let X i ( i = 1,
, n ) be independent random variables each having exponential density 1
− x

f ( x, θ ) =

θ

e

θ

;

x > 0, θ > 0

For this distribution, the statistic
ln n Sn Y2 = ( n − 1) Y1 = X (n) where where X ( n ) = max X i
i

S=

∑ Xi
i =1

n

are both modal unbiased for θ .
We get , F ( x) =

∫θ
0

x

1

x

e

θ dx
x

⎡ −x ⎤ = − ⎢e θ ⎥ ⎢ ⎥ ⎣ ⎦0

= 1− e

x

θ

The p.d . f . of X ( n ) is
Estimation ~ 19 of 22

− n⎛ g x( n ) , θ = ⎜1 − e ⎜ θ⎝

(

)

x( n )

θ

⎞ ⎟ ⎟ ⎠

n −1

e

x − ( n)

θ

;

x( n ) > 0 , θ > 0

Now,
Y1 = X (n) ln n ⇒ dX ( n ) = ( ln n ) dY1

Hence the p.d . f . of Y1 is
y − 1 ln n ⎞ n ln n ⎛ ⎜1 − e θ ⎟ g ( y1 , θ ) = ⎟ θ ⎜ ⎝ ⎠ n −1

e
n −1

θ

y1

ln n

;
y1

y1 > 0 , θ > 0
1 ⎡ ⎤ − ⎢ Putting n θ = a ⎥ ⎢ ⎥ ⎣ ⎦

⇒ ⇒ ⇒ ⇒

θ ∂g ( y1 , θ )
∂y1
y1

⎛ 1 y1 ⎞ n ln n ⎜ ⎧ − θ ⎫ ⎟ ⎪ ⎪ = 1 − ⎨n ⎬ ⎟ θ ⎜ ⎪ ⎜ ⎩ ⎪ ⎟ ⎭ ⎠ ⎝ n −1 y n ln n = 1 − a y1 a1

⎧ −1 ⎫ ⎪ θ⎪ ⎨n ⎬ ⎪ ⎪ ⎩ ⎭

(

)

(i ) ( n − 1) ( −a y
⎡ ⎢∵ ⎢ ⎣
1

=

n ln n

(1 − a ) ( n − 1) a
na y1 = 1 y1 = θ

θ

(1 − a )
=1

y1 n − 2

ln a a y1 +

)

n ln n

θ

(1 − a )

y1 n −1

a y1 ln a = 0

y1

1

n

θ

⎤ = a⎥ ⎥ ⎦

It can be shown that second derivative of ( i ) at y1 = θ is negative and hence Y1 is modal unbiased estimate of
θ.

Now, the p.d . f . of Sn is
h ( sn , θ ) =
n sn −1

θ n
n

e

sn

θ

;

sn > 0 , θ > 0

Now,
Y2 = Sn n −1 ⇒ dSn = ( n − 1) dY2

Hence the p.d . f . of Y2 is
⎡( n − 1) y2 ⎤ ⎦ h ( y2 , θ ) = ⎣ θn n = ⇒ ⇒ ⇒
n −1

( n −1) y2
θ

e

( n − 1)
;
n y2 − 2 e

( n − 1)n [ y2 ]n −1
θn n
=

( n −1) y2
θ

e

y2 > 0 , θ > 0

∂h ( y2 , θ ) ∂y2
− y2 1 −

( n − 1)n
θ n
n

( n − 1)

( n −1) y2
θ

( n − 1)n [ y2 ]n−1
θ n
n

( n −1) y2
θ

e

⎧ ( n − 1) ⎫ ⎪ ⎪ ⎨ ⎬=0 θ ⎪ ⎪ ⎩ ⎭

1

θ y2 = θ

=0

Again the second derivative of h ( y2 ,θ ) at y2 = θ is negative. Hence Y2 is a model unbiased estimate of θ . Thus a modal unbiased estimate is not always unique.
Estimation ~ 20 of 22

Theorem: Let X be a random variable with density function f ( x ; θ ) . Y = g ( x ) And let φ ( y ) be the
density function of Y such that
∂φ ( y ) ∂y ∂y 2

(i ) ( ii ) ( iii )

=0 <0

at at

y=0 y<0

∂ 2φ 2 ( y )

y is 1 − 1 transformation from x to y and from y to x

Then the solution of the following differential equation is a modal unbiased estimate of θ .

(a) (b)
Proof:
φ ( y) =

f ( x, θ )

∂2 x ∂y 2

+

∂f ( x, θ ) ⎛ ∂x ⎞ ⎜ ⎟ =0 ∂x ⎝ ∂y ⎠
2 3

at y = θ at y = θ

∂ 2 f ( x, θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x ∂3 x ⋅ ⋅ 2 + f ( x, θ ) 3 < 0 ⎜ ⎟ +3 ∂x ∂y ∂y ∂x 2 ∂y ⎝ ∂y ⎠

∴ ⇒ ⇒

∂ ∂ F ( y) = Fx ⎡ g −1 ( y ) ⎤ ⎦ ∂y ∂y ⎣ ∂ ∂x = Fx ( x ) ∂y ∂y ∂x φ ( y) = f ( x ; θ ) ∂y ∂φ ( y ) ∂y ∂y 2 = ∂f ( x ; θ ) ⎛ ∂x ⎞ ∂2 x ⎜ ⎟ + f ( x, θ ) 2 = 0 ∂x ∂y ⎝ ∂y ⎠
2 3

⎡ Since y = g ( x ) ⎣

x = g −1 ( y ) ⎤ ⎦

at y = θ by our hypothesis

∂ 2φ ( y )

=

∂ 2 f ( x,θ ) ⎛ ∂x ⎞ ∂f ( x,θ ) ∂x ∂ 2 x ∂3 x ⋅ ⋅ 2 + f ( x, θ ) 3 < 0 ⎜ ⎟ +3 ∂x ∂y ∂y ∂x 2 ∂y ⎝ ∂y ⎠ at y = θ according to our hypothesis

Hence the solution of the differential equation is a modal unbiased estimate of θ .

Example: Suppose X ~ N (θ , σ 2 ) . Show that X is a modal unbiased estimate of θ . Solution: We have,
f ( x) = ∂f ( x, θ ) ∂x 1

σ 2π
= 1

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;
2

−∞ < x < ∞

σ 2π

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2( x −θ ) 2σ 2

Now,

Estimation ~ 21 of 22

1

σ 2π
⇒ 1

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠ 1 ⎛ x −θ ⎞ − ⎜ ⎟ 2⎝ σ ⎠

2

∂2 x ∂y 2

2( x −θ ) 2σ 2

1

σ 2π

1 ⎛ x −θ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

⎛ ∂x ⎞ ⎜ ⎟ =0 ⎝ ∂y ⎠

2

2

σ 2π
∂2 x ∂y 2 0− −

e

⎡ ∂ 2 x ( x − θ ) ⎛ ∂x ⎞2 ⎤ ⎢ 2− ⎜ ⎟ ⎥=0 σ 2 ⎝ ∂y ⎠ ⎥ ⎢ ∂y ⎣ ⎦ at y = θ We have, ⇒ ⇒ at y = θ ⇒ Y = g ( x) Y=x ∂Y =1 ∂x ∂ 2Y =0 ∂x 2

∴ ⇒ ⇒

( x − θ ) ⎛ ∂x ⎞2 (1)
2

σ2 ( x −θ ) σ2

⎜ ⎟ =0 ⎝ ∂y ⎠ =0

x =θ

So, x = y is the solution of this equation? Hence X is a modal unbiased estimate of θ .

Estimation ~ 22 of 22

Jackknife Estimator and Correction for Bias
Jackknife Estimator: The jackknife estimator was introduced by Quenouille in 1949 and named by Tukey
in 1958. The jackknife technique’s purpose is to decrease the bias of an estimator and provide an approximate confidence interval for the parameter of interest. If a parameter has a UMVUE associated with it, then clearly there is no chance of improving such as estimator’s bias. However, MLE’s are often biased and hence improvement may be possible in the sense of an estimator with lower bias. Jackknifing is an important technique for accomplishing such bias reduction. Let X1 , X 2 ,
ˆ , X n be a random sample of size n from a population with real valued parameter θ . Let θ be
n observations each ( N is N

an estimator of θ . Divide the random sample into N groups of equal size m =

one of the factors of n ). Delete one group at a time and estimate θ based on the remaining ( N − 1) m observations, using the same estimation procedure previously used with a sample of size n . Denote the estimator of θ obtained with the i th group deleting by θˆi ( i = 1, 2, For i = 1, 2,
, N let us consider a new statistic ˆ ˆ J i = Nθ − ( N − 1)θi

, N ) called a jackknife statistic.

and consider
1 ˆ J θ = N

()

∑ ⎡ Nθˆ − ( N − 1)θˆi ⎤ ⎣ ⎦
i =1

N

ˆ ˆ = Nθ − ( N − 1)θi

1 ˆ Where, θi = N

∑θˆi
i =1

N

ˆ J θ is called Jackknife estimator of θ .

()

Generally, we take m = 1 , then the commonly used Jackknife estimate is J θˆ = nθˆ + ( n − 1)θˆi , since
m =1= n ⇒n=N. N

()

Note:
ˆ ˆ ˆ ˆ ˆ ˆ J θ = Nθ + θ − θ − Nθ i + θ i ˆ ˆ ˆ = θ + θ ( N − 1) − θ i ( N − 1) ˆ ˆ ˆ = θ + ( N − 1) θ − θi

()

(

)

Which shows that the estimator J θˆ is an adjustment of θˆ with the amount of adjustment depending on the difference between θˆ and θˆi .

()

Correction for Bias: If we have a biased estimator then we have to add or make simple adjustment to have
unbiased estimator. But sometimes the expected value is a rather complicated function of the parameter then it is very difficult to add a simple factor to make a biased estimator into an unbiased one. Let tn denote the biased estimator of θ based on n observation.

Jackknife Estimator and Correction for Bias ~ 1 of 13

ˆ E ( tn ) = E θ = θ + ⇒ E ( tn ) − θ =

()

∑ nrr
a
r =1

( A)

⎡ where, ar is function of θ ⎤ ⎢but cons tan t w. r. to n ⎥ ⎣ ⎦

∑ nrr
a
r =1

Let tn −1 be the estimated value of θ based on ( n − 1) observation. Again tn −1, i ( i = 1,

, n ) denote the

estimated value of θ based on ( n − 1) observation where i th observation is omitted. Then we will have
tn −1, 1 , tn −1, 2 , , tn −1, n . tn −1 is the average of these n estimated values with ( n − 1) observations each.

Let us define another new statistic as
′ tn = ntn − ( n − 1) tn −1 = tn + ( n − 1)( tn − tn −1 ) ′ ⇒ E ( tn ) = θ + ⎡ J θ = nθ − ( n − 1)θ ; here, t = θ and t = θ ⎤ ˆ ˆ ˆ ˆ ˆ i n n −1 i⎥ ⎢ ⎣ ⎦

()

∞ ⎛ ⎧1 n ⎫⎞ ar ⎪ ⎪ + ( n − 1) ⎜ θ + −E⎨ tn −1, i ⎬ ⎟ r r ⎜ n n ⎪ n i =1 ⎪⎟ r =1 r =1 ⎩ ⎭⎠ ⎝ ∞ ∞ ∞ ⎛ ar ar ar ⎞ ⎟ =θ + + ( n − 1) ⎜ θ + −θ − r r r ⎟ ⎜ r =1 n r =1 n r =1 ( n − 1) ⎠ ⎝

ar

∞ r =1 ∞

=θ + =θ +

∑ nr
ar ar

+ ( n − 1) +n

∑ nr
ar
∞ r =1

− ( n − 1) ar −

r =1

ar

( n − 1)r
ar
r −1

∑ nr
r =1 ∞

∑ nr ∑ nr ∑
ar −
r =1 r =1 ∞

r =1 ( n − 1)

= θ + a1 + =θ +

∑ nrr−1 − a1 − ∑
a
r =2 r =2

( n − 1)r −1

ar

∑ nrr−1 − ∑
a
r =2

r =2

( n − 1)r −1
⎞ ⎛ 1 1 ⎞ ⎟ + a4 ⎜ − ⎟+ 3 ⎟ ⎜ n {n − 1}3 ⎟ ⎠ ⎝ ⎠

ar

⎛ 1 1 ⎞ 1 ⎛1 = θ + a2 ⎜ − ⎟ + a3 ⎜ 2 − ⎜ n {n − 1}2 ⎝ n n −1 ⎠ ⎝ ′ ⇒ E ( tn ) = θ − a2 n2 ⎛ 1 ⎞ − Ο⎜ 3 ⎟ ⎝n ⎠
1 n2

′ That is, tn is only biased of order

but tn has the bias of order
2

1 ′ i.e. tn reduces the bias. Similarly we can n

take another statistic
′′ tn = ′ n 2 tn − ( n − 1) tn −1 n 2 − ( n − 1) a2
3 2

′′ ⇒ E ( tn ) = θ −

⎛ 1 ⎞ − Ο⎜ 4 ⎟ n ⎝n ⎠

That is, bias of order

1 n3

. So, every step amount of bias is very small. So in this method we can remove bias

completely or to any required degree. N.B.: Explain Jackknife Method and discuss how it reduce the bias.

Example: Let, x1 , x2 , ..........., xn be a random sample of size n with the probability density function
f x ; µ, σ 2 =

(

)

1

σ 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Find the jackknife estimator of µ .
Jackknife Estimator and Correction for Bias ~ 2 of 13

Solution: Here, we have that
Thus the likelihood function is:
⇒ ⇒ ⇒ ⇒ ⇒

f x ; µ, σ

(

2

)=σ
2

1 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

⎛ 1 ⎞ − 2 ∑⎜ i =1 ⎝ L=⎜ ⎟ e ⎝ σ 2π ⎠
n 1

n

⎛ xi − µ ⎞ σ ⎟ ⎠

1 n n log L = − log ( 2π ) − log σ 2 − 2 2 2σ 2 1 ∂ log L =− 2 ∂µ 2σ

∑ ( xi − µ )
i =1

n

2

∑ 2 ( xi − µ )( −1) = 0
i =1

n

∑ ( xi − µ ) = 0
i =1

n

ˆ µ=

1 n ∑ xi n i =1

So, the maximum likelihood estimate of the parameter µ is:
ˆ µ=x= ⇒ ˆ E (µ) = 1 n

1 n

∑ E ( xi ) = µ
i =1

i =1 n

∑ xi

n

(+) (+ +)

Thus, the jackknife estimator is not to be found here. But, we can find the jackknife estimator by taking
θˆi =

n 1 xj n − 1 j ≠ i =1

n

θˆi =
ˆ θi =

1 ⎛ ⎜ n −1 ⎜ ⎝ 1 n
n i =1

∑ x j − xi ⎟ ⎟
j =1

⎞ ⎠ 1 n =
n i =1

=

1 ( nx − xi ) n −1 ⎤

So,

∑θˆi

=

∑ ⎢ n − 1 ( nx − xi )⎥ ⎣ ⎦
1 n 2 x − nx n ( n − 1)

⎡ 1

(

)
(+ + +)

θˆ = x
ˆ ˆ ˆ J θ = nθ − ( n − 1)θi

So, the jackknife estimator is given by

() ˆ ⇒ J (θ ) = nx − ( n − 1) x ⎡ from ( + ) and ( + + + ) ⎤ ⎣ ⎦ ˆ ⇒ J (θ ) = x ( A) ˆ E ⎡ J (θ ) ⎤ = E ( x ) = µ ∴ ⎣ ⎦ ˆ ) = x is an unbiased and uniformly minimum variance unbiased estimator So, we can say that J (θ

of µ .

Example: Let x1 , x2 , ..........., xn be a random sample of size n with the probability density function
f x ; µ, σ 2 =

(

)

1

σ 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

Find the jackknife estimator of σ 2 , where µ and σ 2 both are unknown.
Jackknife Estimator and Correction for Bias ~ 3 of 13

Solution: Here, we have that
Thus the likelihood function is:
⇒ ⇒ ⇒ ⇒ ⇒

f x ; µ, σ

(

2

)=σ

1 2π

1 ⎛ x−µ ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

;

−∞ ≤ ( x, µ ) ≤ ∞ , σ 2 > 0

⎛ 1 ⎞ − 2 ∑⎜ i =1 ⎝ L=⎜ ⎟ e σ 2π ⎠ ⎝
n 1

n

⎛ xi − µ ⎞ σ ⎟ ⎠

2

n n 1 log L = − log ( 2π ) − log σ 2 − 2 2 2σ 2 1 ∂ log L =− 2 ∂µ 2σ

∑ ( xi − µ )
i =1

n

2

∑ 2 ( xi − µ )( −1) = 0
i =1

n

∑ ( xi − µ ) = 0
i =1

n

ˆ µ=

1 n

∑ xi
i =1

n

(1)

Again, we have that
∂ log L ∂σ ⇒ ⇒ ⇒ 1
2 n

=−

n 2σ
2

+
2

1 2σ
4

∑ ( xi − µ )
i =1

n

2

=0

∑ ( xi − µ ) 2σ 4
1

=

n 2σ 2

σ2

∑ ( xi − µ )
i =1

i =1 n

2

=n
2

ˆ S2 = σ 2 =

1 n

∑ ( xi − µ )
i =1

n

( 2)

The likelihood equations for simultaneous estimation of µ and σ 2 are
∂ log L =0 ∂µ

and

∂ log L =0 ∂σ 2

From equation (1) and (2 ) we have,
ˆ µ=x ⇒ ˆ θˆ = S 2 = σ 2 = ˆ σ2 = 1 n 2 ∑ ( xi − x ) n i =1 = = =
2 1 n ∑ ⎡( xi − µ ) − ( x − µ )⎤ ⎣ ⎦ n i =1

1 n 2 ˆ ∑ ( xi − µ ) n i =1

(+)

1⎡ n 2 2⎤ ⎢ ∑ ( xi − µ ) − n ( x − µ ) ⎥ n ⎢ i =1 ⎥ ⎣ ⎦ 1 n 2 2 ∑ ( xi − µ ) − ( x − µ ) n i =1

Jackknife Estimator and Correction for Bias ~ 4 of 13

1 n 2 2 ˆ ˆ ⇒ E θ = E σ 2 = ∑ E ( xi − µ ) − E ( x − µ ) n i =1 = ⇒ ⇒
2 2 1 n ∑ E ( xi − E ( xi ) ) − E ( x − E ( x ) ) n i =1

() ( )

1 n ˆ ˆ E θ = E σ 2 = ∑ Var ( xi ) − Var ( x ) n i =1

() ( ) () ( )

σ ˆ ˆ E θ = E σ2 =σ2 − n
n

2

(+ +)

ˆ So, we can say that θˆ = σ 2 = 1 ∑ ( xi − x )2 is a biased estimator of σ 2 . But, we can remove the bias by the n i =1

application of a simple adjustment such as follows:
⇒ ⇒

σ ˆ ˆ E θ = E σ2 =σ2 − n ⎛ n ˆ⎞ ⎛ n 2⎞ ˆ E⎜ θ ⎟ = E⎜ σ ⎟ =σ2 n −1 ⎠ n −1 ⎠ ⎝ ⎝

()

( )

2

So, we can say that be found here.

n ˆ n ˆ θ= σ 2 is an unbiased estimator of θ = σ 2 . Thus, the jackknife estimator is not to n −1 n −1

But, we can find the jackknife estimator by taking
⎛ n xj ⎜ ˆ = σ 2 = j ≠ i =1 − ⎜ j ≠ i =1 θi ˆ i n −1 ⎜ n −1 ⎜ ⎜ ⎝

n

x2 j

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

2

ˆ ˆ θi = σ i2 =

i =1

n

xi2 − xi2 n −1

⎛ ⎜ −⎜ ⎜ ⎜ ⎜ ⎝

⎞ xi − xi ⎟ ⎟ i =1 n −1 ⎟ ⎟ ⎟ ⎠
n 2

2

=

∑ xi2 − xi2
i =1

n

n −1

⎛ nx − xi ⎞ −⎜ ⎟ ⎝ n −1 ⎠

So,

⎡ n 2 ⎤ 2 ⎢ ∑ xi − xi 2⎥ 1 1 ⎛ nx − xi ⎞ ⎥ θˆi = ∑ θˆi = ∑ ⎢ i =1 −⎜ ⎟ n i =1 n i =1 ⎢ n − 1 ⎝ n −1 ⎠ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ n ⎡ n 2 n 2⎤ 1 1 = ( nx − xi )2 ⎢ n∑ xi − ∑ xi ⎥ − 2 ∑ n ( n − 1) ⎣ i =1 i =1 ⎦ n ( n − 1) i =1
n n

=

⎡ n 2 ⎤ 1 1 ⎢ ∑ xi ( n − 1) ⎥ − 2 n ( n − 1) ⎣ i =1 ⎦ n ( n − 1)

∑ ( n2 x 2 − 2nxi x − xi2 )
n i =1

=

∑ xi2
i =1

n

n

1 n ( n − 1)
2

n n ⎡ 3 2 2⎤ ⎢ n x − 2nx ∑ xi + ∑ xi ⎥ i =1 i =1 ⎣ ⎦

Jackknife Estimator and Correction for Bias ~ 5 of 13

=

∑ xi2
i =1

n

n

1 n ( n − 1)
2

⎡ 2 2 ⎢n x ( n − 2) + ⎣

∑ xi2 ⎥
i =1

n

⎤ ⎦

=

∑ xi2
i =1

n

n

n ( n − 2) x 2

( n − 1)

2

∑ xi2
i =1

n

n ( n − 1)

2

=

∑ xi2
i =1

n

n

×

2 n 2 − 2n + 1 − 1 n ( n − 2 ) x − ( n − 1)2 ( n − 1)2

⎛ n ⎛ n ⎜ xi2 ⎜ xi n ( n − 2 ) ⎜ i =1 ⎜ i =1 = −⎜ ⎜ ( n − 1)2 ⎜ n ⎜ n ⎜ ⎜ ⎜ ⎝ ⎝ ˆ n ( n − 2) θ = n ( n − 2) σ 2 ˆ ˆ θi = ( n − 1)2 ( n − 1)2

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

2

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

(+ + + +)

So, the jackknife estimator is given by

⇒ ⇒ ⇒

( n − 1)2 ⎡ ( n − 2) ⎤ ˆ ˆ J (θ ) = nσ 2 ⎢1 − ⎥ ⎢ ( n − 1) ⎥ ⎣ ⎦
n ˆ ˆ J θ = σ2 n −1

() ˆ ˆ J (θ ) = nσ ()

ˆ ˆ ˆ J θ = nθ − ( n − 1)θi
2

− ( n − 1)

n ( n − 2)

ˆ σ2

⎡ from ( + ) and ( + + + + ) ⎤ ⎣ ⎦

( A)

n ˆ ˆ E ⎡J θ ⎤ = E σ2 ⎣ ⎦ n −1 ⎛1 n == E⎜ n −1 ⎜ n ⎝ = =

()

( )
∑ ( xi − x )
i =1 n 2

⎞ ⎟ ⎟ ⎠

n ⎛ 2 σ2 ⎞ ⎜σ − ⎟ n −1 ⎜ n ⎟ ⎝ ⎠ n ⎛ n −1 2 ⎞ σ ⎟ ⎜ n −1 ⎝ n ⎠ =σ2

n ˆ n 1 ˆ θ= σ2 = So, we can say that J θˆ = n −1 n −1

()

n − 1 i =1

∑ ( xi − x )

n

2

is an unbiased and uniformly minimum variance

unbiased estimator θ = σ 2 .

Example: Let, x1 , x2 , ..........., xn be a random sample of size
f ( x ; p ) = p x (1 − p )
1− x

n

with the probability density function

; x=0 ,1

Find the jackknife estimator of Var ( x ) = pq .

Solution: Here, we have that
Thus the likelihood function is:

f ( x ; p ) = p x (1 − p )

1− x

; x=0 ,1

Jackknife Estimator and Correction for Bias ~ 6 of 13

L ( x; p ) = p ⇒ ln L ( x; p ) =

∑ xi
i =1

n

(1 − p )n −∑ x
i =1

n

i

∑ xi ln p + ⎜ n − ∑ xi ⎟ ln (1 − p ) ⎜ ⎟
i =1

n

⎛ ⎝

n

⎞ ⎠

i =1

n ⎛ ⎞ xi ⎜ n − xi ⎟ ⎜ ⎟ ∂ ln L ( x; p ) i =1 i =1 ⎠ =0 = +⎝ 1− p ∂p p

n

∑ xi − p∑ xi − np + p∑ xi
i =1 i =1

n

n

n

p (1 − p ) y n

i =1

=0

ˆ p=

∑ xi
i =1

n

n

=

⎡ ⎢ Let , ⎢ ⎣

∑ xi = y ~ B ( n, p )⎥ ⎥
i =1

n

⎤ ⎦

So, the maximum likelihood estimator of p is:

ˆ p=

∑ xi
i =1

n

n

=

y . n

And, we know that if θˆ is the maximum likelihood estimator of θ and g (θ ) is a one-to-one function of θ ,
⎡That is, ⎣ g (θ1 ) = g (θ 2 )
ˆ ⇔ θ1 = θ 2 ⎤ , then g θ is the maximum likelihood estimator of g (θ ) . ⎦

()

So, from the above, we can say that the maximum likelihood estimator of θ = pq is given by:
ˆ ˆˆ θ = pq = ⇒ y⎛ y⎞ ⎜1 − ⎟ n⎝ n⎠
2

(+)

⎛ y⎞ ⎛ y⎞ ˆ ˆˆ E θ = E ( pq ) = E ⎜ ⎟ − E ⎜ ⎟ ⎝n⎠ ⎝n⎠ 2 1 1 = E ( y ) − 2 ⎡Var ( y ) + ⎡ E ( y ) ⎤ ⎤ ⎣ ⎦ ⎥ ⎣ ⎦ n n ⎢ 1 1 = np − 2 npq + n 2 p 2 ( S in ce, n n pq = p− − p2 n pq ˆ ˆˆ E θ = E ( pq ) = pq − n

()

(

)

y ~ B ( n, p ) )

()

(+ +)

ˆˆ So, we can say that θˆ = pq is a biased estimator of θ = pq . But, we can remove the bias by the application of

a simple adjustment such as follows:
⎛ n −1⎞ ˆ ˆˆ E θ = E ( pq ) = ⎜ ⎟ pq ⎝ n ⎠ ⎛ n ˆ⎞ ⎛ n ⎞ ˆˆ θ ⎟ = E⎜ E⎜ pq ⎟ = pq ⎝ n −1 ⎠ ⎝ n −1 ⎠

()

(+ + +)

So, we can say that be found here.

n ˆ n ˆˆ θ= pq is an unbiased estimator of θ = pq . Thus, the jackknife estimator is not to n −1 n −1

But, we can find the jackknife estimator by taking
Jackknife Estimator and Correction for Bias ~ 7 of 13

⎧ y −1 ⎛ y −1 ⎞ ⎪ n − 1 ⎜1 − n − 1 ⎟ ⎪ ⎝ ⎠ θˆi = ⎨ y ⎛ y ⎞ ⎪ 1− ⎪ n −1 ⎜ n −1 ⎟ ⎝ ⎠ ⎩

; ;

if if

yi = 1 yi = 1

(That is, if the i

th

trial is a success trial is a failure

) )

(That is, if the i
y ⎛ y ⎞⎤

th

Now, since there are x success and n − y failures to be removed, then we have that
θˆi =
1 n

∑θˆi = n ⎢ y n − 1 ⎜1 − n − 1 ⎟ + ( n − y ) n − 1 ⎜1 − n − 1 ⎟⎥ ⎝ ⎠ ⎝ ⎠⎦ ⎣
i =1

n

1 ⎡ y −1 ⎛

y −1 ⎞

= = = = ⇒

1 ⎡ y ( y − 1)( n − y ) y ( n − y )( n − y − 1) ⎤ ⎢ ⎥ + n⎢ ⎥ ( n − 1)2 ( n − 1)2 ⎣ ⎦ y ( n − y )( n − 2 ) n ( n − 1)
2

y ( n − y ) n ( n − 2) n n ( n − 1)2 y⎛ y ⎞ n ( n − 2) ⎜1 − ⎟ n ⎝ n ⎠ ( n − 1)2 ˆˆ pq

θˆi =

n ( n − 2)

( n − 1)2

(+ + + +)

So, the jackknife estimator is given by
ˆ ˆ ˆ J θ = nθ − ( n − 1)θ i ˆˆ = npq − ( n − 1) n ( n − 2)

()

( n − 1)2 ⎡ ( n − 2) ⎤ ˆˆ = npq ⎢1 − ⎥ ⎢ ( n − 1) ⎥ ⎣ ⎦
= n ˆˆ pq n −1

ˆˆ pq

⎡ from ( + ) and ( + + + + ) ⎤ ⎣ ⎦

( A)

n ˆ ˆˆ E ⎡J θ ⎤ = E pq ⎣ ⎦ n −1 ( ) n ⎛ pq ⎞ = ⎜ pq − ⎟ n −1 ⎝ n ⎠ n n −1 = pq n −1 n ˆ ⎡ J θ ⎤ = pq E ⎣ ⎦

()

⎡ from ( + + ) ⎤ ⎣ ⎦

()

⎡ from ( A ) ⎤ ⎣ ⎦

So, we can say that J θˆ =
θ = pq .

()

n ˆˆ pq is an unbiased and uniformly minimum variance unbiased estimator n −1

Example: Let, x1 , x2 , ..........., xn be a random sample of size
⎛n⎞ 1− x f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ⎝ x⎠ ; x = 0, 1, ,n

n

with the probability density function

Find the jackknife estimator of p 2 .

Solution: Here, we have that

⎛n⎞ 1− x f ( x ; n, p ) = ⎜ ⎟ p x (1 − p ) ⎝ x⎠

; x = 0, 1,

,n

Jackknife Estimator and Correction for Bias ~ 8 of 13

So, for the binomial distribution, we know that
E ( x ) = np and Var ( x ) = npq

So, from the above we have that
⎛x⎞ E⎜ ⎟ = p ⎝n⎠ ⇒ ⇒ ⇒ E x 2 = npq + n 2 p 2 ⎛ x 2 ⎞ pq 2 E⎜ 2 ⎟ = ⎜n ⎟ n + p ⎝ ⎠ ⎡⎛ x ⎞ ˆ E θ = E ⎢⎜ ⎟ ⎢⎝ n ⎠ ⎣ and Var ( x ) = E x 2 + ⎡ E ( x ) ⎤ = npq ⎣ ⎦
2

( )

( ) ()

(**)

2⎤

2 ⎥= p + ⎥ ⎦
2

p (1 − p ) n

2⎤ ⎡ ˆ ⎛ x⎞ ⎢ where, θ = ⎜ ⎟ ⎥ ⎝n⎠ ⎥ ⎢ ⎣ ⎦

(+)

x So, we can say that θˆ = ⎛ ⎞ is a biased estimator of θ = p 2 . And, we cannot remove the bias by the ⎜ ⎟ ⎝n⎠

application of a simple adjustment. So, the jackknife estimator is needed to be found here. And, we can find the jackknife estimator by taking
⎧⎛ x − 1 ⎞ 2 ; if xi = 1 That is, if the i th trial is a success ⎪⎜ ⎟ ⎪⎝ n − 1 ⎠ θˆi = ⎨ 2 ⎪⎛ x ⎞ ; if xi = 1 That is, if the i th trial is a failure ⎪⎜ n − 1 ⎟ ⎝ ⎠ ⎩ Now, since there are x success and n − x failures to be removed, then we have that

(

) )

(

θˆi =

2 2 1 n ˆ 1 ⎡ ⎛ x −1⎞ ⎛ x ⎞ ⎤ ∑θi = n ⎢ x ⎜ n − 1 ⎟ + ( n − x ) ⎜ n − 1 ⎟ ⎥ n i =1 ⎠ ⎝ ⎠ ⎥ ⎢ ⎝ ⎣ ⎦

= ⇒

(n − x) x 1 ⎡ x ( x − 1) ⎢ + n ⎢ ( n − 1)2 ( n − 1)2 ⎣
2

2⎤

⎥ ⎥ ⎦

=

1 ⎡ x3 − 2 x 2 + x + nx 2 − x3 ⎤ 2 ⎦ n ( n − 1) ⎣

θˆi =

1 ⎡ nx 2 − 2 x 2 + x ⎤ 2 ⎦ n ( n − 1) ⎣

(+ +)

So, the jackknife estimator is given by

Jackknife Estimator and Correction for Bias ~ 9 of 13

ˆ ˆ ˆ J θ = nθ − ( n − 1)θi 1 ⎛x⎞ ⎡ nx 2 − 2 x 2 + x ⎤ = n ⎜ ⎟ − ( n − 1) 2 ⎦ n⎠ ⎝ n ( n − 1) ⎣
2

()

⎡ from ( + ) and ( + + ) ⎤ ⎣ ⎦

( n − 1) x 2 − ⎡ nx 2 − 2 x 2 + x ⎤ ⎣ ⎦ n ( n − 1) E ( x2 ) − E ( x ) ˆ E ⎡ J (θ ) ⎤ = ⎣ ⎦ n ( n − 1)
= = = npq + n 2 p 2 − np n ( n − 1) np − np 2 + n 2 p 2 − np n ( n − 1)
2

=

n ( n − 1)

x ( x − 1)

( A)

⎡ from (**) ⎤ ⎣ ⎦ = np 2 ( n − 1) n ( n − 1)

x ( x − 1) So, we can say that J θˆ = is an unbiased and uniformly minimum variance unbiased estimator

⎛ x −x ⎞ ˆ ∴ E ⎡J θ ⎤ = E ⎜ ⎟ = p2 ⎜ n ( n − 1) ⎟ ⎣ ⎦ ⎝ ⎠

()

⎡ from ⎣

( A)⎤ ⎦

()

n ( n − 1)

θ = pq .

Example: Let, x1 , x2 , ..........., xn be a random sample of size
− x −θ f ( x ;θ ) = e ( )

n

with the probability density function

;

x >θ

Find the jackknife estimator of θ .

Solution: Here, we have that
Thus the likelihood function is:

− x −θ f ( x ;θ ) = e ( )

;

x >θ

⎡ n ⎤ L ( x ;θ ) = exp ⎢ − ( xi − θ ) ⎥ ⎢ i =1 ⎥ ⎣ ⎦

(1)

Here, we have to choose θ so that L is maximum in equation (1) . Now, L is maximum if ( x − θ ) is minimum. That is, L is maximum if θ is maximum. Let, x(1) , x( 2 ) ,
, x( n ) be the ordered sample of n independent observations from the given population so that

θ ≤ x(1) ≤ x( 2 ) ≤ ......... ≤ x( n ) ≤ ∞

Since, the maximum value of θ consistent with the sample is x(1) , the smallest observation, then we have that
θˆ = x(1) = The smallest observation

So, the maximum likelihood estimator of θ is
θˆ = x(1) = The smallest observation

(+)

Now, we know that the density function of the r th observation is given by:
r −1 n−r n! f x( r ) = f ( x) ⎡ F ( x ) ⎤ ⎡1 − F ( x ) ⎤ ⎦ ⎣ ⎦ ( r − 1)!( n − r )! ⎣

( )

( 2)

So, from equation ( 2 ) , we have that

Jackknife Estimator and Correction for Bias ~ 10 of 13

1−1 n −1 n! f x(1) = f ( x) ⎡ F ( x ) ⎤ ⎡1 − F ( x ) ⎤ ⎦ ⎣ ⎦ (1 − 1)!( n − 1)! ⎣

( )

= n ⎡1 − F ( x ) ⎤ ⎣ ⎦

n −1

f ( x) x >θ

( 3)

Now, we have that
− x −θ f ( x; θ ) = e ( )

;

− x −θ F ( x ) = ∫ e ( ) dx

x

θ

x − x −θ = − ⎡e ( ) ⎤ ⎣ ⎦0 − x −θ = − ⎡ e ( ) − 1⎤ ⎣ ⎦

− x −θ F ( x) = 1− e ( )

( 4)

So, from equation ( 3) and ( 4 ) , we have that
f x(1) = n ⎡1 − F ( x ) ⎤ ⎣ ⎦

( )

n −1

f ( x)

n −1 − x −θ = n ⎡1 − 1 + e ( ) ⎤ f ( x) ⎣ ⎦ n −1 − x −θ − x −θ = n ⎡e ( ) ⎤ e ( ) ⎣ ⎦
− n x −θ E x(1) = n ∫ xe ( ) dx − n x −θ = ne ( )

( )

(***)

θ

⎡ e − nx ∞ ∞ e− nx ⎤ = ne ⎢ x −∫ dx ⎥ ⎢ −n θ θ −n ⎥ ⎣ ⎦ ∞⎤ ⎡ θ e − nθ 1 e− nx ⎥ = ne nθ ⎢ 0 − + ⎢ −n n −n θ ⎥ ⎣ ⎦ ⎡ θ e − nθ ⎤ ⎡ θ e − nθ e− nθ ⎤ 1 = ne nθ ⎢ − 2 0 − e− nθ ⎥ = nenθ ⎢ + 2 ⎥ n n ⎦ ⎣ n ⎦ ⎣ n 1 ˆ E θ = E x(1) = θ + ∴ (+ +) n So, we can say that θˆ = x(1) is a biased estimator of θ . And, we cannot remove the bias by the application of a

(

)

()

( )

simple adjustment. So, the jackknife estimator is needed to be found here. And, we can find the jackknife estimator by taking
θˆi = ⎨
⎧ x(1) ⎪ ⎪x ⎩ ( 2) ; ; if if xi ≠ x(1) xi = x(1)

Now, since there are x success and n − x failures to be removed, then we have that
θˆi =
1 n ˆ 1⎡ ∑θi = n ⎣( n − 1) x(1) + x( 2) ⎤ ⎦ n i =1

(+ + +)

So, the jackknife estimator is given by
Jackknife Estimator and Correction for Bias ~ 11 of 13

ˆ ˆ ˆ J θ = nθ − ( n − 1)θi 1 = nx(1) − ( n − 1) ⎡( n − 1) x(1) + x( 2 ) ⎤ ⎦ n⎣ ⎡ ( n − 1)2 ⎤ ( n − 1) ⎥ x(1) − x( 2 ) = ⎢n − n ⎥ n ⎢ ⎣ ⎦ ( n − 1) ⎛ 2n − 1 ⎞ x( 2 ) =⎜ ⎟ x(1) − n ⎝ n ⎠ ⎡ from ( + ) and ( + + + ) ⎤ ⎣ ⎦

()

( n − 1) ⎛ n + n −1 ⎞ x( 2 ) =⎜ ⎟ x(1) − n n ⎝ ⎠ ( n − 1) n ⎛ n −1 ⎞ x( 2 ) = x(1) + ⎜ ⎟ x(1) − n n ⎝ n ⎠
⎛ n −1 ⎞ = x(1) + ⎜ ⎟ x(1) − x( 2 ) ⎝ n ⎠ ⎛ n −1 ⎞ ⎡ ˆ ⎤ E ⎡ J θ ⎤ = E x(1) + ⎜ ⎟ E x(1) − E x( 2 ) ⎦ ⎣ ⎦ ⎝ n ⎠⎣

(

)

( A) ( B)

()

( )

( ) ( )

Now, to find the above expected value, first of all we have to find the expected value of the second order statistics as follows: From equation ( 2 ) , we have that
r −1 n−r n! f x( r ) = f ( x) ⎡ F ( x ) ⎤ ⎡1 − F ( x ) ⎤ ⎦ ⎣ ⎦ ( r − 1)!( n − r )! ⎣

( )

f x( 2 ) =

( )

2 −1 n−2 n! f ( x) ⎡ F ( x ) ⎤ ⎡1 − F ( x ) ⎤ ⎦ ⎣ ⎦ 2 − 1) !( n − 2 )! ⎣ ( n−2

= n ( n − 1) ⎡ F ( x ) ⎤ ⎡1 − F ( x ) ⎤ ⎣ ⎦⎣ ⎦

f ( x)

(5)

Now, from equation ( 5 ) and ( 4 ) , we have that
n−2 − x −θ − x −θ − x −θ f x( 2 ) = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡1 − 1 + e ( ) ⎤ e ( ) ⎣ ⎦⎣ ⎦ n −1 − x −θ − x −θ = n ( n − 1) ⎡1 − e ( ) ⎤ ⎡ e ( ) ⎤ ⎣ ⎦⎣ ⎦ − x −θ − x −θ = n ( n − 1) ⎡e ( ) ⎤ − n ( n − 1) ⎡ e ( ) ⎤ ⎣ ⎦ ⎣ ⎦ −( n −1)( x −θ ) ⎤ − n( x −θ ) ⎤ = n ( n − 1) ⎡e − ( n − 1) n ⎡ e ⎣ ⎦ ⎣ ⎦ n −1 n

( )

− n −1 x −θ − n x −θ E x( 2 ) = n ( n − 1) x ⎡e ( )( ) ⎤ dx − ( n − 1) n x ⎡e ( ) ⎤ ⎣ ⎦ ⎣ ⎦

( )

θ

θ

− n −1 x −θ = n ( n − 1) x ⎡e ( )( ) ⎤ − ( n − 1) E x(1) ⎣ ⎦

θ

( )

⎡ from (***) ⎤ ⎣ ⎦

= n ( n − 1) e(

n −1)θ

θ

⎡ ∫ x ⎣e

−( n −1) x ⎤

1⎞ ⎛ − ( n − 1) ⎜ θ + ⎟ n⎠ ⎝

e n −1 θ = n ( n − 1) e( ) ⎢ x

−( n −1) x

⎢ − ( n − 1) θ ⎢ ⎣

−∫
θ

⎤ − n −1 x e ( ) 1⎞ ⎛ dx ⎥ − ( n − 1) ⎜ θ + ⎟ ⎥ − ( n − 1) n⎠ ⎝ ⎥ ⎦

Jackknife Estimator and Correction for Bias ~ 12 of 13

= n ( n − 1) e(

n −1)θ

⎡θ e−( n −1)θ e−( n −1)θ ⎤ 1⎞ ⎛ ⎢ ⎥ − ( n − 1) ⎜ θ + ⎟ + n⎠ ( n − 1) ( n − 1)2 ⎥ ⎝ ⎢ ⎣ ⎦

⎡ 1 ⎤ 1⎞ ⎛ = n ⎢θ + ⎥ − ( n − 1) ⎜ θ + ⎟ n⎠ ( n − 1) ⎥ ⎝ ⎢ ⎣ ⎦ n n −1 = nθ + − nθ + θ − n −1 n n −1 ⎞ ⎛ n =θ +⎜ − ⎟ n ⎠ ⎝ n −1 ⎡ n 2 − ( n − 1)2 ⎤ ⎥ =θ + ⎢ ⎢ n ( n − 1) ⎥ ⎣ ⎦ 2n − 1 =θ + n ( n − 1)

(C )

Now, from equation ( + + ) , ( B ) and ( C ) , we have that
⎛ n −1 ⎞ ⎡ ˆ ⎤ E ⎡ J θ ⎤ = E x(1) + ⎜ ⎟ E x(1) − E x( 2 ) ⎦ ⎣ ⎦ ⎝ n ⎠⎣ 1 ⎞ ⎛ n − 1 ⎞ ⎡⎛ 1⎞ ⎛ 2n − 1 ⎞ ⎤ ⎛ = ⎜θ + ⎟ + ⎜ ⎟⎥ ⎟ ⎢⎜ θ + ⎟ − ⎜ θ + n ⎠ ⎝ n ⎠ ⎢⎝ n⎠ ⎜ n ( n − 1) ⎟ ⎥ ⎝ ⎝ ⎠⎦ ⎣

()

( )

( ) ( )

=θ + =θ + =θ ∴

1 n − 1 ⎛ n − 1 − 2n + 1 ⎞ + ⎜ ⎟ n n ⎜ n ( n − 1) ⎟ ⎝ ⎠ 1 n − 1 ⎛ −n ⎞ + ⎜ ⎟ n n ⎜ n ( n − 1) ⎟ ⎝ ⎠

⎡ ⎤ ⎛ n −1 ⎞ ˆ E ⎡ J θ ⎤ = E ⎢ x(1) + ⎜ ⎟ x(1) − x( 2 ) ⎥ = θ ⎣ ⎦ ⎝ n ⎠ ⎣ ⎦

()

(

)

So, we can say that J θˆ = x(1) + ⎜ ⎟ x(1) − x( 2 ) is an unbiased estimator of θ . ⎝ n ⎠

()

⎛ n −1 ⎞

(

)

Jackknife Estimator and Correction for Bias ~ 13 of 13

Pitman Estimator for Location Parameter

Location invariant
An estimator

T = t ( X1 , X 2 ,

, Xn )

is

defined

to

be

location

invariant

if

and

only

if

t ( x1 + c, x2 + c,

, xn + c ) = t ( x1 , x2 ,

, xn ) + c for all values x1 , x2 ,

, xn and all c .

Example: Show that X n = Solution:

∑ Xi
i =1

n

n

is location invariant.

Let,

t ( x1 , x2 ,......, xn ) = xn =

∑ xi
i =1

n

n

Then we have that

t ( x1 + c, x2 + c,

, xn + c ) =

∑ ( xi + c )
i =1 n

n

n

t ( x1 + c, x2 + c,

+c n , xn + c ) = t ( x1 , x2 , , xn ) + c

=

∑ xi
i =1

So, we can say that X n is location invariant.

Example: Show that
statistics.

Y1 + Yn is location invariant where Y1 is the smallest order statistics and Yn is the largest order 2

Solution:
Let,

t ( x1 , x2 ,

, xn ) =

y1 + yn min ( x1 , x2 , = 2

, xn ) + max ( x1 , x2 , 2

, xn )

Then we have that,

t ( x1 + c, x2 + c,

, xn + c ) = = =

min ( x1 + c, x2 + c, min ( x1 , x2 , min ( x1 , x2 ,

, xn + c ) + max ( x1 + c, x2 + c, , xn ) + c +c

, xn + c )

2 , xn ) + c + max ( x1 , x2 , 2 , xn ) + max ( x1 , x2 , 2

, xn )

t ( x1 + c, x2 + c,

, xn + c ) = t ( x1 , x2 ,

, xn ) + c

So, we can say that

Y1 + Yn is location invariant. 2
Pitman Estimator for Location Parameter ~ 1 of 8

Example: Show that s 2 = Solution:
Let,

1 n Xi − Xn n − 1 i =1

∑(

)

2

is not location invariant.

t ( x1 , x2 ,

, xn ) = s 2 =

1 n 2 ∑ ( xi − xn ) n − 1 i =1
⎛ ⎜ xi + c − 1 ⎜ n i =1 ⎝
n n 2 i n

Then we have that

t ( x1 + c, x2 + c,

, xn + c ) = =

1 n −1

⎞ ( xi + c ) ⎟ ⎟ i =1 ⎠
n

2

1 n −1

∑( x − x )
i =1

t ( x1 + c, x2 + c,
2

, xn + c ) = t ( x1, x2 ,

, xn )

So, we can say that s =

1 n ∑ Xi − Xn n − 1 i =1

(

)

2

is not location invariant.

Example: Show that Yn − Y1 is not location invariant. Solution:
Let,

t ( x1 , x2 ,

, xn ) = Yn − Y1 = max ( x1 , x2 ,

, xn ) − min ( x1 , x2 ,

, xn )
, xn + c )

Then we have that t ( x1 + c, x2 + c,

, xn + c ) = max ( x1 + c, x2 + c, = max ( x1, x2 , = max ( x1, x2 ,

, xn + c ) − min ( x1 + c, x2 + c, , xn ) − c , xn )

, xn ) + c − min ( x1, x2 , , xn ) − min ( x1, x2 ,

t ( x1 + c, x2 + c,

, xn + c ) = t ( x1, x2 ,......, xn )

So, we can say that Yn − Y1 is not location invariant.

Location parameter
Let

{ f (⋅ ; θ ) ; θ ∈ Ω}

be a family of densities indexed by a parameter θ . The parameter θ is defined to be a

location parameter if and only if the density

f ( x ; θ ) can be written as function of

( x −θ ) .

That is

f ( x ; θ ) = h ( x − θ ) for some function h ( ⋅) . Equivalently θ is a location parameter for the density f X ( x ; θ ) of a
random variable X if and only if the distribution of

( X −θ )

does not depend on θ .

We note that if θ is a location parameter for the family of densities definition is a density function given by h ( ⋅) = f ( ⋅ ; 0 ) .

{ f (⋅ ; θ ) ; θ ∈ Ω} , then the function h (⋅)

of the

Example: If f ( x ; θ ) = φθ , 1 ( x ) , then show that θ is a location parameter. Solution:
Here, we have that
1 ( x −θ )2 2

f ( x ; θ ) = φθ , 1 ( x ) =

1 2π

e

= φ0, 1 ( x − θ ) = h ( x − θ )

Or, we can say that if X is distributed normally with mean θ and variance 1, then distribution. Hence, the distribution of

( X −θ )

has a standard normal

( X −θ )

is independent of θ .

So, we can say that θ is a location parameter.
Pitman Estimator for Location Parameter ~ 2 of 8

Example: If f ( x ; θ ) = I ⎛ Solution:
Here, we have that

1 1⎞ ⎜θ − 2 , θ + 2 ⎟ ⎝ ⎠

( x ) , then show that θ

is a location parameter.

f ( x ; θ ) = I⎛ =

1 1⎞ ⎜θ − 2 , θ + 2 ⎟ ⎝ ⎠

( x)
=1 = I⎛

1 1 1 θ + −θ − 2 2

= h ( x −θ )
Hence, the distribution of

1 1⎞ ⎜ − 2, 2 ⎟ ⎝ ⎠

( x −θ )

( X −θ )
1

is independent of θ . So, we can say that θ is a location parameter.

Example: If f ( x ; θ ) = Solution:

1 ⎢ ⎣

π ⎡1 + ( x − θ )2 ⎤
⎥ ⎦

, then show that θ is a location parameter.

Here, we have that

f (x ; θ ) =

1 ⎢ ⎣

1 ⎥ ⎦

π ⎡1 + ( x − θ )2 ⎤

= h( x −θ )

Hence, the distribution of

( X −θ )

is independent of θ . So, we can say that θ is a location parameter.

Example: If f ( x ; θ ) = φθ , 9 ( x ) , then show that θ is a location parameter. Solution:
Here, we have that
1 ( x −θ )2 2×9

f ( x ; θ ) = φθ , 9 ( x ) =

1

3 2π = φ0, 9 ( x − θ )

e

= h ( x −θ )

Or, we can say that if X is distributed normally with mean θ and variance 9, then with mean 0 and variance 9. Hence, the distribution of

( X −θ )

has a normal distribution

( X −θ )

is independent of θ .

So, we can say that θ is a location parameter.

Pitman estimator for location
Let, X 1 , X 2 , the estimator

, X n denote a random sample from the density f ( ⋅ ; θ ) , where θ is a location parameter. Then,

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ;θ )dθ i =1 ∫ ∏ f ( X i ;θ )dθ i =1
n

n

is the estimator of θ which has uniformly smallest mean-squared error within the class of location-invariant estimators. The estimator given in the above equation is defined to be the pitman estimator location.
Pitman Estimator for Location Parameter ~ 3 of 8

Example: Let, X 1 , X 2 ,

, X n be a random sample from a normal distribution with mean θ and the variance unity,

where θ is a location parameter. Find the pitman estimator of θ .

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ;θ )dθ ∫ ∏ f ( X i ;θ )dθ ∑
i =1 n

n

⎡ 1 ⎛ 1 ⎞ ⎜ ⎟ exp ⎢ − i =1 ⎢ 2 ⎝ 2π ⎠ ⎣ n ⎡ 1⎛ n 2 ⎤ ⎞ θ exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ ⎟⎥ 2 ⎜ i =1 ⎢ i =1 ⎝ ⎠⎦ ⎣ = n ⎡ 1⎛ n 2 ⎞⎤ exp ⎢ − ⎜ X i − 2θ X i + nθ 2 ⎟ ⎥ dθ ⎟⎥ ⎢ 2 ⎜ i =1 i =1 ⎝ ⎠⎦ ⎣ ⎡ 1 ⎤ θ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ ⎣ 2 ⎦ = 1 ⎡ ⎤ exp ⎢ − −2nX nθ + nθ 2 ⎥ dθ ⎣ 2 ⎦

=

∫θ ⎜ ∫

⎡ 1 ⎛ 1 ⎞ ⎟ exp ⎢ − ⎢ 2 ⎝ 2π ⎠ ⎣
n n

∑( Xi −θ )
i =1 n

n

2⎤

⎥ dθ ⎥ ⎦

∑( Xi −θ )
i =1

2⎤

⎥ dθ ⎥ ⎦

(

)

(

)

=

∫ θ exp ⎢− ∫

⎡ n 2 ⎤ θ − 2 X nθ + X 2 ⎥ dθ ⎣ 2 ⎦ ⎡ n ⎤ exp ⎢ − θ 2 − 2 X nθ + X 2 ⎥ dθ 2 ⎣ ⎦

(

)

(

)

2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎜θ − X ⎟ ⎥ 1 ⎟ ⎥ dθ θ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎜ ⎟ ⎥ ⎝ n ⎠ ⎦ ⎢ ⎥ ⎣ = 2⎤ ⎡ ⎛ ⎞ ⎢ ⎥ 1 ⎜θ − X ⎟ ⎥ ⎢− ⎜ ⎟ dθ exp ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎜ n ⎟ ⎥ ⎝ ⎠ ⎥ ⎢ ⎣ ⎦

t ( X1 , X 2 ,

, X n ) = E (θ ) = X n

2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎜θ − X ⎟ ⎥ 1 1 ⎟ ⎥ dθ θ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ 1 2π ⎢ ⎜ n ⎟ ⎥ n ⎝ ⎠ ⎥ ⎢ ⎣ ⎦ = 2⎤ ⎡ ⎛ ⎞ ⎢ ⎜ ⎟ ⎥ 1 ⎢ − 1 ⎜ θ − X ⎟ ⎥ dθ exp ⎢ 2⎜ 1 ⎟ ⎥ 1 2π ⎢ ⎜ n ⎟ ⎥ n ⎝ ⎠ ⎥ ⎢ ⎣ ⎦

So, we can say that X n is a pitman estimator.

Example: Let, X 1 , X 2 ,

1 1⎞ ⎛ , X n be a random sample from a uniform distribution over the interval ⎜ θ − , θ + ⎟ , where 2 2⎠ ⎝

θ is a location parameter. Find the pitman estimator for θ .
Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ;θ )dθ i =1 ∫ ∏ f ( X i ;θ )dθ i =1
n

n

∫ θ ∏ I⎛
=

n

∫ ∏ I⎛⎜θ − 1 , θ + 1 ⎞⎟ ( X i )dθ i =1
2 2⎠

1 1⎞ θ− ,θ+ ⎟ i =1 ⎜ 2 2⎠ ⎝ n

( X i )dθ

Pitman Estimator for Location Parameter ~ 4 of 8

=

∫ θ ∏ I⎛⎜ X − 1 , X + 1 ⎞⎟ (θ )dθ i =1

i

n

2

i

2⎠

∫ ∏ I⎛⎜ X − 1 , X + 1 ⎞⎟ (θ )dθ i =1

i

n

2

i

2⎠

⎡ ⎢ ⎢ ⎢That is ⎢ ⎢ ⎢ ⎢ ⎣

1 1⎤ ≤ Y1 ≤ Y2 ≤ ........ ≤ Yn ≤ θ + ⎥ 2 2 ⎥ 1 1 ⎥ θ − ≤ Y1 ⇒ θ ≤ Y1 + ⎥ 2 2 ⎥ 1 1 ⎥ Yn ≤ θ + ⇒ θ ≥ Yn − ⎥ 2 2 ⎦

θ−

Y1 +

=

Yn −

∫ 1 θ dθ
=

1 2

⎡θ 2 ⎤ ⎢ ⎥ ⎢ 2 ⎥ ⎣ ⎦

Y1 + Yn −

1 2 1 2

2 1 Y1 + 2

Yn −

∫ 1 dθ
2

[θ ]Y −21
n

Y1 +

1

2

t ( X1 , X 2 ,

1⎛ 1⎞ ⎛ 1⎞ Y1 + ⎟ − ⎜ Yn − ⎟ 2⎜ 2⎠ ⎝ 2⎠ ⎝ = 1⎞ ⎛ 1⎞ ⎛ ⎜ Y1 + 2 ⎟ − ⎜ Yn − 2 ⎟ ⎝ ⎠ ⎝ ⎠ 1 , X n ) = (Y1 + Yn ) 2

2

2

Theorem
A pitman estimator for location is a function of sufficient statistics.

Proof:
We know that if S1 = s1 ( X 1 , X 2 , factorization criterion

, Xn ),

, S k = sk ( X 1 , X 2 ,

, X n ) is a set of sufficient statistics, then by the

∏ f ( xi ;θ ) = g ( s1 , s2 ,
i =1

n

, sk ; θ ) h ( x1 , x2 ,

, xn )

So, the pitman estimator can be written as

t ( X1, X 2 ,

, Xn ) =

∫θ ∏ f ( X ∫∏ f ( X
i =1 1 1 i =1 n i

n

i

; θ ) dθ ; θ ) dθ , Sk ; θ ) h ( X1, X 2 , , S k ; θ ) h ( X 1, X 2 , , Sk ; θ ) dθ , S k ; θ ) dθ , X n ) dθ , X n ) dθ

t ( X1, X 2 ,

, Xn

∫θ g ( S , S , ∫ g (S , S , θ g (S , S , )= ∫ ∫ g (S , S ,
=
2 2 1 2 1 2

The above is the function of the sufficient statistics. So, we can say that a pitman estimator is a function of the sufficient statistics.

Example: Let, X 1 , X 2 ,

, X n be a random sample from a normal distribution with mean θ and the variance 9, where θ

is a location parameter. Find the pitman estimator of θ when

∑ xi = 225 .
i =1

15

Pitman Estimator for Location Parameter ~ 5 of 8

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ; θ ) dθ ∫ ∏ f ( X i ; θ ) dθ i =1 ∫θ ⎜ ∫

i =1 n

n

n ⎡ 1 n ⎛ X − θ ⎞2 ⎤ 1 ⎞ i exp ⎢ − ⎥ dθ ⎟ 2 i =1 ⎜ 3 ⎟ ⎥ 3 2π ⎠ ⎝ ⎠ ⎦ ⎢ ⎝ ⎣ = n ⎡ 1 n ⎛ X − θ ⎞2 ⎤ ⎛ 1 ⎞ i ⎜ ⎟ exp ⎢ − ⎜ ⎟ ⎥ dθ ⎢ 2 i =1 ⎝ 3 ⎠ ⎥ ⎝ 3 2π ⎠ ⎣ ⎦ n ⎡ 1 ⎛ n 2 ⎤ 2⎞ θ exp ⎢ − ⎜ X i − 2θ X i + nθ ⎟ ⎥ dθ ⎜ ⎟⎥ ⎢ i =1 ⎠⎦ ⎣ 2 × 9 ⎝ i =1 = n ⎡ 1 ⎛ n 2 ⎞⎤ 2 exp ⎢ − ⎜ X i − 2θ X i + nθ ⎟ ⎥ dθ ⎜ ⎟⎥ ⎢ i =1 ⎠⎦ ⎣ 2 × 9 ⎝ i =1 ⎡ n ⎤ −2 X nθ + θ 2 ⎥ dθ θ exp ⎢ − ⎣ 2×9 ⎦ = ⎡ n ⎤ −2 X nθ + θ 2 ⎥ dθ exp ⎢ − ⎣ 2×9 ⎦

(

)

(

)

=

∫ θ exp ⎢ − ∫

⎡ n ⎤ θ 2 − 2 X nθ + X 2 ⎥ dθ ⎣ 2×9 ⎦ ⎡ n 2 2 ⎤ exp ⎢ − θ − 2 X nθ + X ⎥ dθ ⎣ 2×9 ⎦

(

)

(

)

t ( X1 , X 2 ,

, X n ) = E (θ ) = Xn = 1 n

2 ⎡ ⎛ ⎞ ⎤ ⎢ ⎜θ − X ⎟ ⎥ 1 1 ⎟ ⎥ dθ exp ⎢ − ⎜ θ ⎢ 2⎜ 3 ⎟ ⎥ 3 2π ⎢ ⎜ ⎟ ⎥ n ⎝ n ⎠ ⎦ ⎢ ⎥ ⎣ = 2⎤ ⎡ ⎛ ⎞ ⎢ ⎜ ⎟ ⎥ 1 ⎢ − 1 ⎜ θ − X ⎟ ⎥ dθ exp ⎢ 2⎜ 3 ⎟ ⎥ 3 2π ⎢ ⎜ n ⎟ ⎥ n ⎝ ⎠ ⎥ ⎢ ⎣ ⎦

∑ Xi
i =1

n

So, we can say that X n is a pitman estimator. Now, when

∑ xi = 225 , then the pitman estimator for θ
i =1

15

is given by

xn =

1 15

∑ xi =
i =1

15

225 = 15 15

Exercise: Let, X 1 , X 2 ,

, X n be a random sample from the density
− x −θ f ( x ; θ ) = e ( ) I(θ ,∞ ) ( x )

for − ∞ < θ < ∞

Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ; θ ) dθ i =1 ∫∏ i =1
n

n

=

⎣ ⎦ ∫ θ ∏ exp ⎡− ( X i − θ )⎤I(θ ,∞ ) ( x ) dθ i =1 ⎣ ⎦ ∫ ∏ exp ⎡− ( X i − θ )⎤I(θ ,∞ ) ( x ) dθ i =1
Pitman Estimator for Location Parameter ~ 6 of 8
n

n

f ( X i ; θ ) dθ

Y1

=

−∞ Y1

∫ θ exp ⎢−∑ ( X i − θ )⎥ dθ
⎢ i =1 ⎥ ⎣ ⎦ ⎡ n ⎤ ∫ exp ⎢−∑ ( X i − θ )⎥ dθ ⎢ i =1 ⎥ ⎣ ⎦ −∞
nθ ∫ θ e dθ

n

Y1

=

−∞ Y1

∫ θ exp ⎢−∑ X i + nθ ⎥ dθ ⎢ i =1 ⎥ ⎣ ⎦
−∞

n

∫ exp ⎢−∑ X i + nθ ⎥ dθ ⎢ i =1 ⎥ ⎣ ⎦
enθ n
Y1

n

Y1

θ
=

−∞

=

−∞ Y1

e nθ dθ n −∞
Y1

−∞

enθ dθ
nY1

enθ n

Y1 −∞

t ( X1 , X 2 ,

, Xn ) =

Y1 e n

− e

1 nY1 e n2

nY1

= Y1 −

1 n

n

Exercise: Let, X 1 , X 2 ,

, X n be a random sample from the density
f ( x ; θ ) = θ x (1 − θ )
1− x

for

x = 0, 1

and

0 <θ <1

Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ; θ ) dθ ∫ ∏ f ( X i ; θ ) dθ i =1 ∫ θθ
0 1 i =1 n
n

n

=

x ∫ θ ∏θ (1 − θ ) i =1 n

n

1− x

1− x x ∫ ∏θ (1 − θ ) dθ i =1

∑ Xi
i =1 n

(1 − θ )n −∑ X
i =1

n

i

dθ dθ

=

∫θ
0 1

1

∑ Xi
i =1 n

(1 − θ )n −∑ X
i =1 n i =1

n

i

=

∫ θ i=1
0

∑ X i +1 ∑ Xi
i =1 n

(1 − θ )n −∑ X
n i =1 i

i

dθ dθ ⎛ ⎞

∫θ
0 1

1

(1 − θ )n −∑ X
(1 − θ )

=

∫θ
0

∑ X i + 2−1
i =1

n

n − ∑ X i +1−1
i =1

n

dθ =

β ⎜ ∑ X i + 2, n − ∑ X i + 1⎟ ⎜ ⎟
i =1 ⎝ i =1 ⎠ n ⎛ n ⎞ β ⎜ X i + 1, n − X i + 1⎟ ⎜ ⎟ i =1 ⎝ i =1 ⎠

n

n

∫θ
0 n

1

∑ X i +1−1
i =1

n

(1 − θ )n −∑ X +1−1 dθ
i =1 i

n

=

i =1

Xi + 2 n − n+3

∑ Xi +1
i =1

n

×

n+2

∑ Xi +1 n − ∑ Xi +1
i =1 i =1

n

n

t ( X1 , X 2 ,

, Xn ) =

∑ Xi +1
i =1

n

n+2
Pitman Estimator for Location Parameter ~ 7 of 8

Exercise: Let, X 1 , X 2 ,

, X n be a random sample from the density f ( x ; θ ) = θ e −θ for x>0 and

θ >0

Then, find the pitman estimator for the location parameter θ .

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ θ ∏ f ( X i ; θ ) dθ ∫ ∏ f ( X i ; θ ) dθ i =1
∞ i =1 n

n

=

∫ θ ∏θ e i =1
0 ∞ n

n

−θ

0 ∞

∫ ∏θ e i =1
e

−θ

=

∫ θθ
0 ∞

n − nθ

∫θ

n − nθ

e

dθ dθ

0 ∞

=

∫θ
0 ∞

n +1 − nθ

e

∫θ

n − nθ

e

0 ∞

=

∫e
0 ∞ 0

− nθ

θ n + 2 −1dθ θ n +1−1dθ

∫e

− nθ

t ( X1 , X 2 ,

n+2 nn+ 2 = n +1 n n +1 n +1 = n 1 , Xn ) = 1+ n

Pitman Estimator for Location Parameter ~ 8 of 8

Pitman Estimator for Scale Parameter

Scale invariant
An estimator T = t ( X 1 , X 2 ,

, X n ) is defined to be scale invariant if and only if , cxn ) = ct ( x1 , x2 , , xn )

t ( cx1 , cx2 ,
for all values x1 , x2 ,

, xn and all c > 0 .

Example: Show that X n = Solution:

∑ Xi
i =1

n

n

is scale invariant.

Let,

t ( x1 , x2 ,

, xn ) = xn =

∑ xi
i =1

n

n

Then we have that

t ( cx1 , cx2 ,

, cxn ) =

∑ cxi
i =1

n

n

t ( cx1 , cx2 ,

n , cxn ) = ct ( x1 , x2 ,

=c

∑ xi
i =1

n

, xn )

So, we can say that X n is scale invariant.

Example: Show that
statistics.

Y1 + Yn is scale invariant where Y1 is the smallest order statistics and Yn is the largest order 2

Solution:
Let,

t ( x1 , x2 ,

, xn ) =

y1 + yn min ( x1 , x2 , = 2

, xn ) + max ( x1 , x2 , 2

, xn )

Then we have that,

t ( cx1 , cx2 ,

, cxn ) =

min ( cx1 , cx2 , min ( x1 , x2 ,

, cxn ) + max ( cx1 , cx2 , 2 , xn ) + max ( x1 , x2 , 2

, cxn )

=c ⇒ t ( cx1 , cx2 ,

, xn )

, cxn ) = ct ( x1 , x2 ,

, xn )

So, we can say that

Y1 + Yn is scale invariant. 2
Pitman Estimator for Scale Parameter ~ 1 of 5

Example: Show that Solution:
Let,

s2 =

1 n ∑ Xi − X n n − 1 i =1

(

)

2

is scale invariant.

t ( x1 , x2 ,

, xn ) = s 2 =

1 n 2 ∑ ( xi − xn ) n − 1 i =1

Then we have that

t ( cx1 , cx2 ,

, cxn ) = =c

1 n ⎛ 1 ⎜ cxi − ⎜ n − 1 i =1 ⎝ n

⎞ cxi ⎟ ⎟ i =1 ⎠
n

2

1 n ( xi − xn )2 n − 1 i =1

So, we can say that

t ( cx1 , cx2 , 1 n ∑ Xi − X n n − 1 i =1

, cxn ) = ct ( x1 , x2 ,
2

, xn )

s2 =

(

)

is scale invariant.

Example: Show that Yn − Y1 is not scale invariant. Solution:
Let,

t ( x1 , x2 ,

, xn ) = Yn − Y1 = max ( x1 , x2 ,
, cxn ) = max ( cx1 , cx2 , , cxn ) = ct ( x1 , x2 , = c ⎡ max ( x1 , x2 , ⎣

, xn ) − min ( x1 , x2 ,
, cxn ) − min ( cx1 , cx2 , , xn ) − min ( x1 , x2 ,

, xn )
, cxn ) , xn ) ⎤ ⎦

Then we have that

t ( cx1 , cx2 , t ( cx1 , cx2 ,

, xn )

So, we can say that Yn − Y1 is scale invariant.

Location Parameter
Let

{ f (⋅ ; θ ) ; θ > 0}

be a family of densities indexed by a real parameter θ . The parameter θ is defined to be a

scale parameter if and only if the density f ( x ; θ ) can be written as a function of

X

θ

. That is, f ( x ; θ ) =

for some function h ( ⋅) . Equivalently θ is a scale parameter for the density f X ( x ; θ ) of a random variable X if and only if the distribution of

h θ ⎜θ ⎟ ⎝ ⎠

1 ⎛x⎞

X

θ

does not depend on θ .

We note that if θ is a scale parameter for the family of densities definition is a density function given by h ( x ) = f ( x ; 1) .

{ f (⋅ ; θ ) ; θ > 0} , then the function h (⋅)

of the

Example: If f ( x ; θ ) = φ0, σ 2 ( x ) , then show that θ is a scale parameter. Solution:
Here, we have that f ( x ; θ ) = φ0, σ 2 ( x ) = ⎛x⎞ = φ0, 1 ⎜ ⎟ ⎝σ ⎠ 1
1⎛ x ⎞ − ⎜ ⎟ e 2⎝ σ ⎠
2

σ 2π

⎛x⎞ = h⎜ ⎟ ⎝σ ⎠ X

Or, we can say that if X is distributed normally with mean 0 and variance σ 2 , then distribution. Hence, the distribution of

θ

has a standard normal

X

θ

is independent of θ .

So, we can say that θ is a scale parameter.
Pitman Estimator for Scale Parameter ~ 2 of 5

Example: If f ( x ; θ ) = Solution:

I ( x ) , then show that θ is a scale parameter. θ (θ , 2θ )

1

Here, we have that

f (x ; θ ) =
=

I ( x) θ (θ , 2θ )
⎛x⎞ I (1, 2 ) ⎜ ⎟ θ ⎝θ ⎠ 1 ⎛x⎞ = h⎜ ⎟ ⎝θ ⎠

1

Hence, the distribution of

X

θ

is independent of θ . So, we can say that θ is a scale parameter.

Example: If f ( x ; θ ) = Solution:

1

x

θ

e

θ

I ( 0, ∞ ) ( x ) , then show that θ is a scale parameter.

Here, we have that

f (x ; θ ) = X

1

x

θ

e

θ

⎛x⎞ I ( 0, ∞ ) ( x ) = h ⎜ ⎟ ⎝θ ⎠

Hence, the distribution of

θ

is independent of θ . So, we can say that θ is a scale parameter.

Example: If f ( x ; θ ) = Solution:

I ( x ) , then show that θ is a scale parameter. θ ( 0, θ )

1

Here, we have that

f (x ; θ ) = =

I ( x) θ ( 0, θ ) ⎛x⎞ I ( 0, 1) ⎜ ⎟ θ ⎝θ ⎠ 1 ⎛x⎞ = h⎜ ⎟ ⎝θ ⎠

1

Hence, the distribution of

X

θ

is independent of θ . So, we can say that θ is a scale parameter.

Pitman estimator for Scale
Let, X 1 , X 2 ,

, X n denote a random sample from the density f ( ⋅ ; θ ) , where θ > 0 is a scale parameter.

Assume that f ( x ; θ ) = 0 for x ≤ 0 . That is, the random variable X i assume only the positive values. Then, the estimator

t ( X1 , X 2 ,

, Xn ) =

⎛ 1 ⎞ ⎜ 2⎟ ⎝θ ⎠ ⎛ 1 ⎞

∏ f ( X i ; θ ) dθ
i =1 n

n

∫ ⎜ θ 3 ⎟ ∏ f ( X i ; θ ) dθ ⎝ ⎠ i =1

is the estimator of θ which has uniformly smallest risk within the class of scale-invariant estimators for the loss function l ( t ; θ ) =

( t − θ )2
θ2

.

The estimator given in the above equation is defined to be the pitman estimator for scale.

Pitman Estimator for Scale Parameter ~ 3 of 5

Example: Let, X 1 , X 2 ,

, X n be a random sample from a density function.

f (x ; θ ) =

I ( x) θ ( 0, θ )

1

Find the pitman estimator of θ for the scale parameter.

Solution:
We know that the pitman estimator for θ is given by

t ( X1 , X 2 ,

, Xn ) =

∫ ⎜ θ 2 ⎟ ∏ f ( X i ; θ ) dθ ⎝ ⎠ i =1 ∫ ⎜ θ 3 ⎟ ∏ f ( X i ; θ ) dθ ⎝ ⎠ i =1 ∫ ∫

⎛ 1 ⎞ ⎛ 1 ⎞

n

n

=

⎛ 1 ⎞ ⎜ 2⎟ ⎝θ ⎠ ⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠

∏θ ∏θ
i =1 i =1 n

n

1

I ( 0, θ ) ( X i ) dθ I ( 0, θ ) ( X i ) dθ

=

1

Yn ∞

∫ θ2 ⎜θ ⎟ ⎝ ⎠ ∫

1 ⎛1⎞

n

Yn

1 ⎛1⎞ ⎜ ⎟ dθ θ3 ⎝θ ⎠

n

∫θ ∫θ

−( n + 2 )

dθ = dθ

=

Yn ∞

⎡ θ −( n + 2 ) +1 ⎤ ⎢ ⎥ ⎢ − ( n + 2 ) + 1 ⎥Y ⎣ ⎦n ⎡ θ −( n + 3) +1 ⎤ ⎢ ⎥ ⎢ − ( n + 3) + 1 ⎥Y ⎣ ⎦n

− ( n + 3)

Yn

1 0 − Yn− n −1 − ( n + 1) = 1 0 − Yn− n − 2 − ( n + 2)

(

) )
n θ . So, by the Lehmann-Scheffe theorem n +1

(

t ( X1 , X 2 ,

, Xn ) =

n+2 × Yn n +1

Note: We know that Yn is a complete sufficient statistic and E (Yn ) =
n +1 Yn is the UMVUE of θ . n

Example: Let, X 1 , X 2 ,

, X n be a random sample from a density function.
f (x ; θ ) = 1

x

Find the pitman estimator of θ for the scale parameter.

θ

e

θ

I ( 0, ∞ ) ( x )

Solution:
We know that the pitman estimator of θ for scale parameter is given by

t ( X1 , X 2 ,

, Xn ) =

∫ ⎜ θ 2 ⎟ ∏ f ( X i ; θ ) dθ ⎝ ⎠ i =1 ∫
⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠ ⎛ 1 ⎞ ⎛ 1 ⎞ ⎜ 3⎟ ⎝θ ⎠

⎛ 1 ⎞

n

i =1 n

n

.

f ( X i ; θ ) dθ 1
− x

=

∫ ⎜ θ 2 ⎟ ∏ θ e θ I(0, ∞ ) ( x ) dθ ⎝ ⎠ i =1 ∫ ∏θ
i =1 n

1

x

e

θ

I ( 0, ∞ ) ( x ) dθ
Pitman Estimator for Scale Parameter ~ 4 of 5

⎛ ⎜ − n+2 θ ( ) exp ⎜ − ⎜ 0 ⎜ ⎜ ⎝ = ⎛ ⎜ ∞ − n +3 θ ( ) exp ⎜ − ⎜ 0 ⎜ ⎜ ⎝

∑ Xi ⎟ ⎟
i =1

n

⎟ dθ ⎟ ⎟ ⎠ n ⎞ Xi ⎟ ⎟ dθ i =1 θ ⎟ ⎟ ⎟ ⎠

θ

⎡ ⎢ Let , ⎢ n ⎢ ⎢ Xi ⎢ i =1 =Z ⎢ θ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

⎤ ⎥ ⎥ n ⎥ Xi ⎥ ⎥ ⇒ θ = i =1 ⎥ Z ⎥ n Xi ⎥ ⎥ ⇒ dθ = − i =1 2 ⎥ Z ⎥ ⎦

⎛ ∞⎜ ⎜ ⎜ 0⎜ ⎜ = ⎝ ⎛ ∞⎜ ⎜ ⎜ 0⎜ ⎜ ⎝

⎞ Xi ⎟ ⎟ i =1 Z ⎟ ⎟ ⎟ ⎠
n

−( n + 2 )

⎞ Xi ⎟ ⎟ i =1 Z ⎟ ⎟ ⎟ ⎠
n

− ( n + 3)

⎛ n Xi ⎜ e− Z ⎜ − i =1 2 ⎜ Z ⎜ ⎜ ⎝

⎞ ⎟ ⎟ dz ⎟ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ dz ⎟ ⎟ ⎟ ⎠

e− Z

⎛ n Xi ⎜ ⎜ − i =1 ⎜ Z2 ⎜ ⎜ ⎝

Z n e− Z dz ⎛ n ⎞ 0 = ⎜ Xi ⎟ ∞ ⎜ ⎟ ⎝ i =1 ⎠ Z n +1e− Z dz

⎛ 0 = ⎜ Xi ⎟ ∞ ⎜ ⎟ ⎝ i =1 ⎠

n

n

e ⎞∫

0 ∞

−Z

Z n +1−1dz Z n + 2 −1dz

∫e
0

−Z

t ( X1 , X 2 ,

⎛ ⎞ n +1 = ⎜ Xi ⎟ ⎜ ⎟ ⎝ i =1 ⎠ n + 2 ⎛ n ⎞ ⎜ Xi ⎟ ⎜ ⎟ i =1 ⎠ , Xn ) = ⎝ n +1

Note: It can be shown that UMVUE of θ is

∑ Xi
i =1

n

∑ Xi
. Again note that
i =1

n

n

n

is a scale invariant estimator and hence

∑ Xi
i =1

n

n +1

is a scale-invariant estimator having uniformly smallest risk for the loss function l ( t ; θ ) =

( t − θ )2
θ2

, the risk

of
i =1 n

n

Xi
is uniformly smaller than the risk of

∑ Xi
i =1

n

n +1

n
n

. Also, since here risk equals

1

θ2

times the

MSE , the MSE

∑ Xi
of
i =1

∑ Xi
is uniformly smaller than the

n +1

MSE

of

i =1

n

.

Pitman Estimator for Scale Parameter ~ 5 of 5

Bayes and Minimax Estimation

Decision Function
A decision function δ ( x ) is a statistic that takes values in D , that is, δ is a Borel measurable function that maps

R n into D .

Elements of Decision Function
The elements of decision are • • • Choices available or alternatives or options. States of nature. Payoffs

Prior Distribution
Let f (θ ) be the probability distribution of the parameter θ which is also summarizes the objective information about θ prior to obtaining sample observation. We will choose f (θ ) with sampler variance, so that f (θ ) is the prior distribution of θ .

Posterior Distribution
Consider a random variable X and the distribution of X is denoted by f ( x | θ ) . This distribution depends on θ , where θ is unknown parameter.

Let x1 , x2 ,

, xn be a random sample, then the joint distribution can be written as
f ( x1, x2 , , xn | θ ) = f ( x1 | θ ) f ( xn | θ )

The posterior distribution of θ as the conditional distribution of θ given the sample values or sample measures. So,

f (θ x1 , x2 ,

, xn ) = =

f ( x1 , x2 ,

f ( x1 , x2 ,

, xn , θ ) , xn ) , xn | θ ) f (θ ) , xn )

Where, f ( x1 , x2 , , xn , θ ) = joint distribution of sample & θ = f (θ ) f ( x1 , x2 , , xn | θ )

f ( x1 , x2 ,

f ( x1 , x2 ,

Thus f θ x1 , x2 ,

(

, xn ) is known as the posterior distribution of θ .

Example: A time failure of a transistor is known to be exponentially distributed with parameter θ having the density
function:

f ( x θ ) = θ e−θ x
g Θ (θ ) = ke − kθ

;

x>0

Assume that the prior distribution of Θ is given by

;

θ >0

That is, Θ is also exponentially distributed over the interval ( 0, ∞ ) . Find the posterior distribution of Θ . Bayes and Minimax Estimation ~ 1 of 20

Solution:
We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

, xn ) =

f X1 , X 2 ,
n

, X n ,Θ , Xn

f X1 , X 2 ,
i =1 n

( x1 , x2 , ( x1 , x2 ,

, xn , θ ) , xn )

=

∏ ⎡ f ( xi θ )⎤ gΘ (θ ) ⎣ ⎦ ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
θ e
n ∞ −θ ∑ xi
i =1 n n

=

ke

− kθ

∫θ
0

n

e

−θ ∑ xi
i =1

=

θ

n

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ ⎝ i =1 ⎠ e

ke

− kθ

∫θ
0

n

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ ⎝ i =1 ⎠ dθ e

=

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ n ⎝ i =1 ⎠ θ e ⎛ n ⎞ ∞ − ⎜ ∑ xi + k ⎟θ ⎜ ⎟ ⎝ i =1 ⎠ θ n +1−1dθ e

0

=

⎛ n ⎞ −θ ⎜ ∑ xi + k ⎟ ⎜ ⎟ n ⎝ i =1 ⎠ θ e

n +1 ⎛ ⎞ ⎜ xi + k ⎟ ⎜ ⎟ ⎝ i =1 ⎠

⎛ n ⎞ ⎜ xi + k ⎟ ⎜ ⎟ i =1 ⎠ =⎝ n +1

n +1 ⎛ n ⎞ − ⎜ ∑ xi + k ⎟θ ⎜ ⎟ ⎝ i =1 ⎠ θ n +1−1 e

n

n +1

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

n ⎛ ⎞ , xn ) = Gamma ⎜ n + 1, xi + k ⎟ ⎜ ⎟ i =1 ⎝ ⎠

;

θ ≥0

Example: Let X 1 , X 2 ,

, X n denote a random sample from normal distribution with the density

⎡ 1 2⎤ exp ⎢ − ( x − θ ) ⎥ 2π ⎣ 2 ⎦ Assume that the prior distribution of Θ is given by f (x θ ) = 1 g Θ (θ ) = ⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ 2π ⎣ 2 ⎦ 1 ;

;

−∞ ≤ x ≤ ∞

−∞ ≤θ ≤ ∞

That is, Θ is standard normal. Find the posterior distribution of Θ .

Solution:
We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

, xn ) =

f X1 , X 2 ,
n

, X n ,Θ , Xn

f X1 , X 2 ,
i =1 n

( x1 , x2 , ( x1 , x2 ,

, xn , θ ) , xn )

=

∏ ⎡ f ( xi θ )⎤ gΘ (θ ) ⎣ ⎦ ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
n

⎡ 1 n ⎛ 1 ⎞ 2⎤ ⎜ ⎟ exp ⎢ − ∑ ( xi − θ ) ⎥ ⎢ 2 i =1 ⎥ ⎝ 2π ⎠ ⎣ ⎦ = n ∞ ⎡ 1 n ⎤ ⎛ 1 ⎞ 2 ∫ ⎜ 2π ⎟ exp ⎢− 2 ∑ ( xi − θ ) ⎥ ⎢ ⎥ ⎝ ⎠ i =1 ⎣ ⎦ −∞

1 ⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ 2π ⎣ 2 ⎦ 1 ⎡ 1 ⎤ exp ⎢ − θ 2 ⎥ dθ 2π ⎣ 2 ⎦
Bayes and Minimax Estimation ~ 2 of 20

=

n ⎡ 1⎛ n ⎞⎤ exp ⎢ − ⎜ xi2 − 2θ xi + nθ 2 + θ 2 ⎟ ⎥ ⎟⎥ ⎢ 2 ⎜ i =1 i =1 ⎝ ⎠⎦ ⎣

n

n

−∞

2 2 2 ∫ exp ⎢− 2 ⎜ ∑ xi − 2θ ∑ xi + nθ + θ ⎟ ⎥ dθ ⎜ ⎟⎥ ⎢ i =1 ⎝ i =1 ⎠

⎡ 1⎛

⎞⎤

=

⎣ ⎡ 1 ⎤ exp ⎢ − ( n + 1)θ 2 − 2θ nx ⎥ ⎣ 2 ⎦

(

)

−∞

∫ exp ⎢− 2 ( ( n + 1)θ ⎣
⎡ 1

2

⎤ − 2θ nx ⎥ dθ ⎦

)

⎡ n +1 ⎛ 2 n ⎞⎤ exp ⎢ − ⎜ θ − 2θ n + 1 x ⎟ ⎥ 2 ⎝ ⎠⎦ ⎣ = ∞ ⎡ n +1 ⎛ 2 n ⎞⎤ θ − 2θ x dθ exp ⎢ − n + 1 ⎟⎥ 2 ⎜ ⎝ ⎠⎦ ⎣ −∞

2 ⎡ n +1 ⎛ n ⎛ n ⎞ ⎞⎤ ⎜ θ 2 − 2θ exp ⎢ − x +⎜ x ⎟ ⎟⎥ n +1 ⎢ 2 ⎜ ⎝ n + 1 ⎠ ⎟⎥ ⎝ ⎠⎦ ⎣ = ∞ 2 ⎞⎤ ⎡ n +1 ⎛ n ⎛ n ⎞ ⎜ θ 2 − 2θ x +⎜ x ⎟ ⎟ ⎥ dθ exp ⎢ − ⎜ n +1 ⎢ 2 ⎝ ⎝ n + 1 ⎠ ⎟⎥ −∞ ⎠⎦ ⎣

2 ⎡ ⎛ ⎞ ⎤ n ⎢ θ− x⎟ ⎥ 1⎜ n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ ⎢ ⎥ ⎜ n +1 ⎟ ⎥ ⎝ ⎠ ⎦ ⎢ ⎣ = 2 ⎡ ⎛ ⎞ ⎤ n ⎢ ∞ θ− x⎟ ⎥ 1⎜ n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎢ 2⎜ 1 ⎟ ⎥ −∞ ⎢ ⎥ ⎜ n +1 ⎟ ⎥ ⎝ ⎠ ⎦ ⎢ ⎣

=
∞ −∞

2 ⎡ ⎛ ⎞ ⎤ n ⎢ θ− x⎟ ⎥ 1 1⎜ n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎢ 2⎜ 1 1 ⎟ ⎥ 2π ⎢ ⎥ ⎜ n +1 n +1 ⎟ ⎥ ⎝ ⎠ ⎦ ⎢ ⎣ 2 ⎡ ⎛ ⎞ ⎤ n ⎢ x⎟ ⎥ ⎜θ − 1 1 n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎢ 2⎜ 1 1 ⎟ ⎥ 2π ⎢ ⎥ ⎜ n +1 n +1 ⎟ ⎥ ⎝ ⎠ ⎦ ⎢ ⎣

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

n 1 ⎞ ⎛ , xn ) = N ⎜ θ ; x, n +1 n +1 ⎟ ⎝ ⎠

;

−∞ ≤θ ≤ ∞

Example:
Let X 1 , X 2 ,

, X n denote a random sample from Poisson distribution with the density f (x θ ) = e −θ θ x x! ; x = 0, 1, ,∞

Assume that the prior distribution of Θ is given by

⎛1 ⎜ β g Θ (θ ) = ⎝

α

⎞ ⎟ − 1θ ⎠ e β θ α −1

;

θ >0

That is, Θ is standard normal. Find the posterior distribution of Θ . Bayes and Minimax Estimation ~ 3 of 20

Solution:
We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

, xn ) =

f X1 , X 2 ,
n

, X n ,Θ , Xn

f X1 , X 2 ,
i =1 n

( x1 , x2 , ( x1 , x2 ,

, xn , θ ) , xn )

=

∏ ⎡ f ( xi θ )⎤ gΘ (θ ) ⎣ ⎦ ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
e
− nθ n

θ i=1

∑ xi

n

=

∏ ( x !)
i =1

( 1β ) e ×
α

1 − θ

β

θ α −1

0

e

− nθ n

θ

∑ xi
i =1

n

∏ ( x !)
i =1

×

( 1β ) e
α

1 − θ

β

θ α −1dθ

=

⎛ n ⎞ ⎛ 1⎞ − ⎜ n + ⎟θ ⎜ ∑ xi +α −1⎟ ⎜ ⎟ ⎝ β ⎠ θ ⎝ i =1 ⎠ e ⎛ n ⎞ ∞ −⎛ n + 1 ⎞θ ⎜ ∑ xi +α −1⎟ ⎜ ⎟ ⎜ ⎟ ⎝ β ⎠ θ ⎝ i=1 ⎠ dθ e

=

⎛ n ⎞ ⎛ 1⎞ −⎜ n + ⎟θ ⎜ ∑ xi +α −1⎟ ⎜ ⎟ ⎝ β ⎠ θ ⎝ i=1 ⎠ e

0

∑ xi + α
i =1

n

( n + 1β )
∑ xi +α
i =1 n

∑ xi +α
i =1

n

( n + 1β ) =
n i =1

⎛ n ⎞ ⎛ 1⎞ −⎜ n + ⎟θ ⎜ ∑ xi +α −1⎟ ⎜ ⎟ ⎝ β ⎠ θ ⎝ i=1 ⎠ e

;

θ >0

∑ xi + α
⇒ f Θ X1 = x1 ,
, X n = xn

x1 , x2 ,

⎛ n ⎞ , xn ) = Gamma ⎜ xi + α , n + 1 ⎟ β ⎟ ⎜ ⎝ i =1 ⎠

(

)

Posterior Bayes estimator
Let X 1 , X 2 ,

, X n be a random sample from a density f ( x | θ ) , where θ is the value of a random variable Θ

with known density g Θ ( ⋅) . The posterior Bayes estimator of τ (θ ) with respect to the prior density g Θ ( ⋅) is defined to be

E ⎡τ (θ ) | X1 , X 2 , ⎣
Here, it is given that

, Xn ⎤ ⎦

E ⎡τ (θ ) | X 1 = x1 , ⎣

, X n = xn ⎤ = ∫ τ (θ ) f Θ| X1 = x1 , ⎦
n

, X n = xn

x1 , x2 ,

, xn ) dθ

=

∫ τ (θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦
i =1

∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1

n

One might note the similarity between the posterior Bayes estimator of τ (θ ) = θ and the Pitman estimator of a location parameter.

Bayes and Minimax Estimation ~ 4 of 20

Example:
Let X 1 , X 2 ,

, X n denote a random sample from Bernoulli density

f ( x θ ) = θ x (1 − θ )
g Θ (θ ) = I( 0, 1) (θ )

1− x

I ( 0, 1) ( x )

for

0 ≤θ ≤1

Assume that the prior distribution of Θ is given by

That is, Θ is uniformly distributed over the interval ( 0,1) . Find the posterior distribution of Θ and find the Bayes estimator of θ and θ (1 − θ ) .

Solution:
We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

, xn ) =

f X1 , X 2 ,

, X n ,Θ , Xn

f X1 , X 2 ,

( x1 , x2 , ( x1 , x2 ,

, xn , θ ) , xn )

=

∏ ∫∏ i =1
n

n

i =1 n

⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎣ ⎦

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎣ ⎦

=

θ i=1
n

∑ xi

n

(1 − θ )n−∑ xi I( 0, 1) (θ ) i =1
n − ∑ xi
i =1 n

n

∫θ

∑ xi
i =1

(1 − θ )

I ( 0, 1) (θ ) dθ

=

n ∑ xi +1−1 θ i=1 (1 − θ )n −∑ xi +1−1 i =1

β ⎜ ∑ xi + 1, n − ∑ xi + 1⎟ ⎜ ⎟
⎝ i =1
i =1 n

n

n

⎞ ⎠
n

f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

⎛ , xn ) = Beta 1st ⎜ θ ; ⎜ ⎝

i =1

xi + 1, n −

∑ xi + 1⎟ ⎟
i =1

⎞ ⎠

;

0 ≤θ ≤1

Again, we have that the posterior Bayes estimator of θ with respect to the prior distribution g Θ (θ ) = I ( 0, 1) (θ ) is given by

E ⎡τ (θ ) X1 = x1 , ⎣

, X n = xn ⎤ = τ (θ ) f Θ X1 = x1 , ⎦

∫ = ∫ θ fΘ X = x ,
1 1

, X n = xn

x1 , x2 ,

, xn ) dθ

, X n = xn

x1 , x2 ,

, xn ) dθ

=

∫ θ ∏ ⎡ f ( xi θ )⎤ gΘ (θ )dθ ⎣ ⎦ ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ )dθ ⎣ ⎦ i =1 ∫ θ i=1
0 1 i =1 n

n

=

∫ θθ i=1
n

∑ xi

n

(1 − θ )n −∑ x
i =1

n

i

I ( 0, 1) (θ ) dθ

n ∑ xi n− x θ i=1 (1 − θ ) ∑ i I ( 0, 1) (θ ) dθ i =1 ∫

∑ xi +1 ∑ xi
i =1 n

n

(1 − θ )n −∑ x
i =1

n

i

dθ dθ ⎞ ⎛ n ⎞ ⎜ xi + 1⎟ n + 2 ⎜ ⎟ i =1 ⎠ =⎝ n+3

=

∫θ
0

1

(1 − θ )

n − ∑ xi
i =1

n

i =1 i =1 ⎠ = ⎝ n n ⎛ ⎞ β ⎜ xi + 1, n − xi + 1⎟ ⎜ ⎟ i =1 ⎝ i =1 ⎠

β ⎜ ∑ xi + 2, n − ∑ xi + 1⎟ ⎜ ⎟

n

n

E ⎡τ (θ ) X1 = x1 , ⎣

, X n = xn ⎤ = ⎦

∑ xi + 1
i =1

n

n+2
Bayes and Minimax Estimation ~ 5 of 20

Hence, the posterior Bayes estimator of θ with respect to the uniform prior distribution is given by

∑ xi + 1
i =1

n

n+2

.

Contrast this to the maximum likelihood estimator of θ , which is

∑ xi
i =1

n

∑ xi
. We know that
i =1

n

n

n

is unbiased and

UMVUE , whereas the posterior Bayes estimator is not unbiased.
Again, we have that the posterior Bayes estimator of θ (1 − θ ) with respect to the prior distribution

g Θ (θ ) = I ( 0,1) (θ ) is given by
E ⎡τ (θ ) X1 = x1 , ⎣ , X n = xn ⎤ = ∫ τ (θ ) f Θ X1 = x1 , ⎦
, X n = xn

x1 , x2 ,

, xn ) dθ , xn ) dθ

= ∫ θ (1 − θ ) f Θ X1 = x1 ,
n

, X n = xn

x1 , x2 ,

∫ θ (1 − θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ )dθ ∫ θ (1 − θ ) f X , X , , X ,Θ ( x1 , x2 , , xn , θ ) dθ ⎣ ⎦ i =1 = = n ∫ f X , X , , X ,Θ ( x1 , x2 , , xn , θ ) dθ ⎡ f ( xi θ ) ⎤ g Θ (θ )dθ ∫∏⎣ ⎦
1 2 n 1 2 n

i =1

=

∫ θ (1 − θ )θ
n

∑ xi
i =1

n

(1 − θ )

n − ∑ xi
i =1

n

I ( 0, 1) (θ ) dθ

n ∑ xi n− x θ i=1 (1 − θ ) ∑ i I ( 0, 1) (θ ) dθ i =1 ∫

=

∫ θ i=1
0

1

∑ xi +1 ∑ xi
i =1 n

n

(1 − θ )n −∑ xi +1 dθ i =1
=

n

β ⎜ ∑ xi + 2, n − ∑ xi + 2 ⎟ ⎜ ⎟
i =1 ⎝ i =1 ⎠ n ⎛ n ⎞ β ⎜ ∑ xi + 1, n − ∑ xi + 1⎟ ⎜ ⎟ i =1 ⎝ i =1 ⎠

n

n

∫θ
0

1

(1 − θ )n −∑ x
i =1

n

i

n ⎛ n ⎞ ⎜ ∑ xi + 1⎟ n − ∑ xi + 1 n + 2 ⎜ ⎟ i =1 i =1 ⎠ =⎝ n+4 n ⎛ n ⎞ ⎜ ∑ xi + 1⎟ n − ∑ xi + 1 ⎜ ⎟ i =1 i =1 ⎠ , X n = xn ⎤ = ⎝ ⎦ ( n + 3)( n + 2 )

E ⎡τ (θ ) X 1 = x1 , ⎣

Hence, the posterior Bayes estimator of θ (1 − θ ) with respect to the uniform prior distribution is given by
n ⎛ n ⎞ ⎜ ∑ xi + 1⎟ n − ∑ xi + 1 ⎜ ⎟ i =1 ⎝ i =1 ⎠ . We noted in the above example that the posterior Bayes estimator that we obtained was ( n + 3)( n + 2 )

not unbiased.

The following remark states that in general a posterior Bayes estimator is not unbiased.
* * Remark: Let TG = tG ( X1 , X 2 ,

, X n ) denote the posterior Bayes estimator of τ (θ ) with respect to a prior distribution

* G ( ⋅) . If both TG and τ ( Θ ) have finite variance, then either
* var ⎡TG θ ⎤ = 0 ⎣ ⎦

Bayes and Minimax Estimation ~ 6 of 20

* * * Or, TG is not an unbiased estimator of τ (θ ) . That is, either TG estimates τ (θ ) correctly with probability 1 or TG is

not an unbiased estimator.

Proof:
Let us suppose that TG is an unbiased estimator of τ (θ ) . That is
*
* E TG θ = τ (θ )

(

)

By the definition, we have that
* * TG = tG ( X 1 , X 2 ,

, X n ) = E ⎡τ (θ ) X 1 , X 2 , ⎣

, Xn ⎤ ⎦

Now, we have that
* * * Var TG = E ⎡Var ⎡ TG Θ ⎤ ⎤ + Var ⎡ E ⎡ TG Θ ⎤ ⎤ ⎢ ⎢ ⎢ ⎢ ⎥⎥ ⎥⎥ ⎣ ⎦⎦ ⎦⎦ ⎣ ⎣ ⎣

( )

( )

( )

⎡ ⎤ * = E ⎢Var ⎡ TG Θ ⎤ ⎥ + Var ⎡τ (θ ) ⎤ ⎣ ⎦ ⎢ ⎥⎦ ⎣ ⎦ ⎣
And

( )

(1)
, X n ⎤⎤ ⎦⎦

Var ⎡τ ( Θ ) ⎤ = E ⎡Var ⎡τ ( Θ ) X1 , X 2 , ⎣ ⎦ ⎣ ⎣

, X n ⎤ ⎤ + Var ⎡ E ⎡τ ( Θ ) X1 , X 2 , ⎦⎦ ⎣ ⎣

* = E ⎡Var ⎡τ ( Θ ) X1 , X 2 , , X n ⎤ ⎤ + Var ⎡TG ⎤ ⎣ ⎦⎦ ⎣ ⎦ ⎣ = Var ⎡τ ( Θ ) ⎤ − E ⎡Var ⎡τ ( Θ ) X1 , X 2 , , X n ⎤ ⎤ ⎣ ⎦ ⎣ ⎦⎦ ⎣

( 2)

Now, from equation (1) and ( 2 ) we have that

⎡ ⎤ * E ⎢Var ⎡ TG Θ ⎤ ⎥ + Var ⎡τ (θ ) ⎤ = Var ⎡τ ( Θ ) ⎤ − E ⎡Var ⎡τ ( Θ ) X1 , X 2 , ⎣ ⎦ ⎣ ⎦ ⎣ ⎢ ⎥ ⎣ ⎣ ⎦⎦ ⎣ * ⎡ ⎤ E ⎡Var ⎡ TG Θ ⎤ ⎤ + E ⎡Var ⎣τ ( Θ ) X1 , X 2 , , X n ⎦ ⎤ = 0 ⎢ ⎢ ⎥⎥ ⎣ ⎦ ⎣ ⎦⎦ ⎣

( )

, X n ⎤⎤ ⎦⎦

( )
⎢ ⎣

Now, since both E Var ⎡ TG Θ ⎤
*

⎡ ⎢ ⎣

( )

⎤ and E ⎡Var ⎡τ Θ X , X , ⎣ ( ) 1 2 ⎥⎥ ⎣ ⎦⎦

, X n ⎤ ⎤ are non-negative and their sum is zero, ⎦⎦

then both are zero.

* In particular, E Var ⎡ TG Θ ⎤ = 0 and since Var ⎡ TG Θ ⎤ is non-negative and has zero expectation, then

⎡ ⎢ ⎣

⎢ ⎣

( )
*

⎤ ⎥⎥ ⎦⎦

⎢ ⎣

( )

⎥ ⎦

* Var ⎡ TG θ ⎤ = 0 . ⎢ ⎥ ⎣ ⎦

( )

Loss Function
Consider estimating g (θ ) , let t = t ( x1 , x2 ,

, xn ) denote an estimate of g (θ ) . The loss function, denoted by

l ( t ; θ ) is defined to be a real valued function satisfying
1) l ( t ; θ ) ≥ 0 for all possible estimates t and all θ in Θ . 2) l ( t ; θ ) = 0 for t = g (θ ) .

l ( t ; θ ) equals the loss incurred if one estimates g (θ ) to be t when θ is the true parameter value.
The word ‘loss’ is used in place of ‘error’ and loss function is used as the measure of the ‘error’. Bayes and Minimax Estimation ~ 7 of 20

Several Possible Loss Function
1) 2)

l1 ( t ; θ ) = ⎡t − g (θ ) ⎤ . It is called the squared error loss function. ⎣ ⎦
2

l2 ( t ; θ ) = t − g (θ ) . It is called the absolute error loss function.

3)

⎧A ⎪ l3 ( t ; θ ) = ⎨ ⎪0 ⎩

if if
r

t − g (θ ) > ε t − g (θ ) ≤ ε , where A > 0

.

4)

l4 ( t ;θ ) = ρ (θ ) t − g (θ )

for ρ (θ ) ≥ 0 and r > 0 .

Note that both l1 and l2 increases as the error T − g (θ ) increases in magnitude. l3 says that we loss nothing if the estimate t is within ε units of g (θ ) and otherwise we loss the amount A . l4 is a general loss function that includes both l1 and l2 as special cases.

Risk Function
For a given loss function l ( ⋅ ; ⋅) , the risk function, denoted by Rt (θ ) , of an estimator T = t ( X 1 , X 2 , defined to be

, X n ) is

Rt (θ ) = Eθ ⎡l (T ;θ ) ⎤ ⎣ ⎦
The risk function is the average loss. The expectation in the above equation can be taken in two ways. For example, if the density f ( x ;θ ) from which we sampled is a probability density function, then

R (θ , t ) = Eθ ⎡l (T ;θ ) ⎤ = Eθ ⎡l ( t ( X 1 , X 2 , ⎣ ⎦ ⎣ =

, X n ) ;θ ) ⎤ ⎦ , X n ) ;θ )

∫ ∫ l ( t ( X1 , X 2 ,
T

∏ f ( xi ;θ ) dxi
i =1

n

Or, we can consider the random variable T and the density of

is fT ( t ) then

R (θ , t ) = Eθ ⎡l (T ; θ ) ⎤ = l (T ;θ ) fT ( t ) dt ⎣ ⎦

Where, fT ( t ) is the density of the estimator T . In either case, the expectation averages out the values of

x1 , x2 ,

ˆ , xn . Since θ is consider to be a random so that risk itself a random variable.

Possible Risk Functions
1) Corresponding to the loss function
2

l1 ( t ;θ ) = ⎡t − g (θ ) ⎤ ⎣ ⎦

2

the

risk

function

is

given

by

Rt (θ ) = Eθ ⎡l1 (T ;θ ) ⎤ = Eθ ⎡t − g (θ ) ⎤ ⎣ ⎦ ⎣ ⎦
2) Corresponding to the loss

function

l2 ( t ; θ ) = t − g (θ )

the

risk

function

is

given

by

Rt (θ ) = Eθ ⎡l2 (T ; θ ) ⎤ = Eθ t − g (θ ) . It is called the mean absolute error. ⎣ ⎦
3) Corresponding to the loss function

⎧A ⎪ l3 ( t ; θ ) = ⎨ ⎪0 ⎩

if if

t − g (θ ) > ε t − g (θ ) ≤ ε , where A > 0

⎡ ⎤ the risk function is given by Rt (θ ) = Eθ ⎣l3 (T ;θ ) ⎦ = AP ⎡ t − g (θ ) > ε ⎤ . θ ⎣ ⎦
4) Corresponding to the loss function l4 ( t ; θ ) = ρ (θ ) t − g (θ )
r given by Rt (θ ) = Eθ ⎡l4 (T ;θ ) ⎤ = ρ (θ ) Eθ ⎡ t − g (θ ) z ⎤ . ⎣ ⎦ ⎣ ⎦

r

for ρ (θ ) ≥ 0 and r > 0 the risk function is

Bayes and Minimax Estimation ~ 8 of 20

When a loss function is said to be Convex and Strictly Convex?
A real valued function L ( t ; θ ) defined over an open interval I = ( a, b ) with −∞ < t < t * < b and any 0 < γ < 1

L ⎡γ t + (1 − γ ) t * ⎤ ≤ γ L ( t ) + (1 − γ ) L t * ⎣ ⎦

( )
*

(1)

The function is said to be strictly convex if strict inequality holds in (1) , for all indicated values of t , t and γ .

Convexity is a vary strong condition which implies, for example, that L is continuous in right derivative at every point of ( a, b ) .

( a, b )

and has a left and

Determination of Convexity
Determination of whether or not a loss function is conves is often easy with the help of the following two criteria. a) If L is defined and differentiable on ( a, b ) , then a necessary and sufficient condition for L to be convex is that

L′ ( t ) ≤ L′ t *

( )

for all a < t < t * < b

(1)

The function is strictly convex iff (1) is strict for all t < t * . b)

(1) is equivalent to If L is twice differentiable then the necessary and sufficient condition
L′′ ( t ) ≥ 0 for all a < t < b
with strict inequality sufficient for strict convexity.

Bayes Estimator With Respect to Loss Function
The Bayes estimator of the parameter θ to be the function d of the sample observation x1 , x2 , minimizes the expected risk, were expected risk is defined as

, xn that

B ( d ) = E ⎡ R ( d , θ )⎤ ⎣ ⎦ = ⎡ ⎢ ⎣
1

= R ( d , θ ) f (θ ) dθ , xn ) ; θ } f ( x1 , , xn | θ ) dx1 dxn ⎤ f (θ ) dθ ⎥ ⎦

∫ ∫ ∫ l {d ( x ,
1

( *)

Now, interchanging the order of integration we can write (*) as

B (d ) =

⎢ ∫ ∫ ⎡ ∫ l {d ( x , ⎣

, xn ) ; θ } f ( x1 ,

, xn | θ ) f (θ ) dθ ⎤ dx1 ⎥ ⎦

dxn

(**)

The function B ( d ) will be minimized if we can find the minimized function d i.e. minimizes the quantity within the third braket of the equation (**) for every set of x values. That is, the Bayes estimator of θ is a function of d of

x1 , x2 ,

, xn that minimizes
1 n 1 n

∫ l {d ( x , , x ) ; θ } f ( x , , x | θ ) f (θ ) dθ = ∫ l {d ( x ) ; θ } f ( x , , x , θ ) dθ = f ( x , , x ) ∫ l {d ( x ) ; θ } f (θ | x , , x ) dθ
1 n 1 n 1 n

Since f ( x1 , ⇒ f (θ | x1 ,

, xn , θ ) = f ( x1 , , xn ) = f ( x1 ,

, xn | θ ) f (θ ) , xn | θ ) f (θ ) , xn )

f ( x1 ,

ˆ Thus the Bayes estimator of θ is the value θ that minimizes

∫ l {d ( x ) ; θ } f (θ | x ,
1

, xn ) dθ = Y

( say )

Bayes and Minimax Estimation ~ 9 of 20

If the loss function is the squared error i.e. ⎡ d ( x ) − θ ⎤ then ⎣ ⎦
2

Y = ⎡ d ( x ) − θ ⎤ f (θ | x1 , ⎣ ⎦
2

, xn ) dθ = ⎡ d ( x ) ⎤ f (θ | x1 , ⎣ ⎦
2

, xn ) dθ , xn ) dθ + θ 2 f (θ | x1 ,

− 2 θ d ( x ) f (θ | x1 ,
Thus minimizing Y with respect to d ( x ) is

, xn ) dθ

∂Y =0 ∂ ⎡d ( x )⎤ ⎣ ⎦ ⇒ 2 d ( x ) f (θ | x1 , ⇒ d ( x) =

, xn ) dθ − 2 θ f (θ | x1 ,
1

, xn ) dθ = 0

∫ θ f (θ | x , ∫ f (θ | x ,
1

, xn ) dθ , xn ) dθ

= Expected Posterior

Hence, d ( x ) is the Bayes estimate for θ if the loss function is in squared error.

Bayesian approach has the following advantages over classical approach. a) We make inferences about the unknown parameters given the data whereas in the classical approach we look at the long run behavior e.g. in 95% of experiments p will lie between p ′ and p ′′ . b) The posterior distribution tells the whole story and if a point estimate or confidence interval be desired they can immediately be obtained from posterior distribution. c) Bayesian approach provides solutions for problems which do not have solutions from the classical point of view.

Note:
a) A decision Rule δ is said to be uniformly better than a decision rule δ ′ if R (δ , θ ) ≤ R (δ ′, θ ) ∀ θ ∈ Θ with strict inequality holding for some θ . b) A decision rule δ * is said to be uniformly best in a class of decision rules D if δ * is uniformly better than any other decision rule δ ∈ D . c) A decision rule is said to be admissible in a class of D if there exists no other decision rule in D which is uniformly better that that δ .

Example: Let X 1 , X 2 ,

, X n be independent N µ , σ 2 variables where µ is unknown but σ 2 is known. Let the prior

(

)

distribution of µ be N θ , σ

(

2

) . Find the Bayes estimate of µ .

Solution:
The joint conditional distribution of the sample given µ is

f ( x1 ,

⎛ 1 ⎞ , xn | µ ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠ ⎛ 1 ⎞ =⎜ ⎟ ⎝ 2πσ 2 ⎠

n

2

⎡ 1 exp ⎢ − 2 ⎣ 2σ

∑ ( xi − µ )

2⎤

⎥ ⎦

n

2

1 2 ⎡ n exp ⎢ − 2 ( x − µ ) − 2 2σ ⎣ 2σ

∑ ( xi − x )

2⎤

⎥ ⎦

f ( x1 ,

2⎤ ⎡ 1 , xn | µ ) ∝ exp ⎢ − 2 ( x − µ ) ⎥ ⎣ 2σ ⎦

Bayes and Minimax Estimation ~ 10 of 20

The posterior distribution of µ given x is

g ( µ | x1 ,

, xn ) =

f ( µ | θ ) f ( x1 , f ( x1 ,

, xn )

, xn | µ )

∝ f ( µ | θ ) f ( x1 ,

, xn | µ )

⎡ n 1 2 2⎤ ∝ exp ⎢ − 2 ( x − µ ) − 2 ( µ − θ ) ⎥ 2σ 0 ⎢ 2σ ⎥ ⎣ ⎦ ⎡ 1 ⎛ nσ 2 + σ 2 ⎞ ⎛ 2 2 nxσ 0 + θσ 0 ∝ exp ⎢ − ⎜ 02 2 ⎟ ⎜ µ − 2 ⎟⎜ ⎢ 2 ⎜ σ0σ nσ 0 + σ 2 ⎠⎝ ⎣ ⎝ 2 2 2 ⎡ nxσ 0 + θσ 0 σ 0σ 2 ⎤ f (µ | x) ~ N ⎢ , ⎥ 2 2 2 nσ 0 + σ 2 ⎥ ⎢ nσ 0 + σ ⎣ ⎦
2 2 nxσ 0 + θσ 0 2 nσ 0 + σ 2

⎞ ⎟ ⎟ ⎠

2⎤

⎥ ⎥ ⎦

If the loss function is squared error, the Bayes estimator of µ is

Theorem
Let X 1 , X 2 , minimizes

, X n be a random sample from the density f ( x | θ ) and let g Θ (θ ) be the density of Θ . Further, let
, ⋅) which

l ( t ; θ ) be the loss function for estimating τ (θ ) . The Bayes estimator of τ (θ ) is that estimator t * ( ⋅ ;

Θ

∫ l (t ( x , x ,
1 2

, xn ) ; θ ) f Θ X1 = x1 ,

, X n = xn

x1 , x2 ,

, xn ) dθ

as a function of t ( ⋅ ;

, ⋅) .

Proof
For a general loss function l ( t ; θ ) , we seek that estimator, say t * ( ⋅ ;
Θ

, ⋅) , which minimizes the expression

⎣ ⎦ ∫ Rt (θ ) gΘ (θ ) dθ = ∫ Eθ ⎡l ( t ; θ )⎤ gΘ (θ ) dθ
Θ

= ∫ Eθ ⎡l ( t ( x1 , x2 , ⎣
Θ

, xn ) ; θ ) ⎤ g Θ (θ ) dθ ⎦

⎡ = ∫ ⎢ ∫ l ( t ( x1 , Θ ⎢R ⎣ ⎡ = ∫ ⎢ ∫ l ( t ( x1 , ⎢ R ⎣Θ ⎡ = ∫ ⎢ ∫ l ( t ( x1 , ⎢ R ⎣Θ

, xn ) ; θ ) f X1 , X 2 , , xn ) ; θ ) f X1 ,

, X n Θ=θ

( x1 , x2 ,

n ⎤ , xn θ ) ∏ dxi ⎥ g Θ (θ ) dθ ⎥ i =1 ⎦

, X n Θ=θ

( x1 ,
, Xn

f X1 ,

, xn θ ) g Θ (θ ) dθ ⎤ ⎥ fX , ⎥ 1 ( x1 , , xn ) ⎦ ⎤ , xn ) dθ ⎥ f X1 , ⎥ ⎦
, Xn

, Xn

( x1 ,
n

, xn ) ∏ dxi
i =1

n

, xn ) ; θ ) f Θ X1 = x1 ,...., X n = xn (θ x1 ,

( x1 ,

, xn ) ∏ dxi
i =1

Since, the integral is non-negative, the double integral can be minimized if the expression within the braces, which is sometimes called the posterior risk, is minimized for each x1 , x2 ,

, xn .

So, in general, the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 ,

, xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes
Θ

∫ l ( t ( x1 ,

, xn ) ; θ ) f Θ X1 = x1 ,

, X n = xn

x1 ,

, xn ) dθ

Hence, the theorem is proved. Bayes and Minimax Estimation ~ 11 of 20

Theorem
Let X 1 , X 2 ,

, X n be a random sample from the density f ( x | θ ) and let gΘ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the squared-error loss function for estimating τ (θ ) . That is, l ( t ; θ ) = ⎡t ( x1 , ⎣
Then the Bayes estimator of τ (θ ) is given by

, xn ) − τ (θ ) ⎤ ⎦

2

E ⎡τ (θ ) | X1 = x1 , ⎣

, X n = xn ⎤ = ⎦

∫τ (θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1 ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦
i =1 n

n

Proof
We, know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 ,

, xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

Θ

∫ l ( t ( x1 ,

, xn ) ; θ ) f Θ X1 = x1 ,

, X n = xn

x1 ,

, xn ) dθ

Here, the loss function is squared error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator which minimizes

Θ

⎣ ∫ ⎡t ( x1 ,

, xn ) − τ (θ ) ⎤ f Θ| X1 = x1 , ⎦
2

, X n = xn

x1 ,

, xn ) dθ , xn ) dθ

=

Θ

⎣ ∫ ⎡τ (θ ) t ( x1 ,

, xn ) ⎤ f Θ| X1 = x1 , ⎦
2

, X n = xn

x1 ,

But the expression in the above is the conditional expectation of

⎡τ (θ ) − t ( x1 , ⎣

, xn ) ⎤ ⎦

2

with respect to the posterior distribution of Θ given X1 = x1 ,

, X n = xn , which is minimized as a function of

t ( x1 ,

, xn ) for t * ( x1 ,

, xn ) equal to the conditional expectation of τ ( Θ ) with respect to the posterior distribution
, X n = xn .

of Θ given X 1 = x1 ,

⎡ Recall that E ( Z − a )2 is minimized as a function of a for a* = E ( Z ) ⎤ ⎢ ⎥ ⎣ ⎦
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by

E ⎡τ (θ ) X 1 = x1 , ⎣

, X n = xn ⎤ = ⎦

∫ τ (θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1 ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
Bayes and Minimax Estimation ~ 12 of 20
n

n

Hence, the theorem is proved.

Theorem
Let X 1 , X 2 ,

, X n be a random sample from the density f ( x | θ ) and let gΘ (θ ) be the density of Θ . Further, let

l ( t ; θ ) be the absolute-error loss function for estimating τ (θ ) . That is, l ( t ; θ ) = t ( x1 , , xn ) − τ (θ )
given

Then the Bayes estimator of τ (θ ) is given by the median of the posterior distribution of Θ

X1 = x1 ,

, X n = xn .

Proof
We know that the Bayes estimator of τ (θ ) with respect to the loss function l ( ⋅ ; ⋅) and prior density g ( ⋅) is that estimator which minimizes the posterior risk, which is the expected loss with respect to the posterior distribution of

Θ given the observations x1 , x2 ,

, xn .

That is, the Bayes estimator of τ (θ ) is that estimator which minimizes

Θ

∫ l ( t ( x1 ,

, xn ) ; θ ) f Θ X1 = x1 ,

, X n = xn

x1 ,

, xn ) dθ

Here, the loss function is absolute-error loss function. So, we have that the Bayes estimator of τ (θ ) is that estimator which minimizes

Θ

∫ t ( x1 ,

, xn ) − τ (θ ) f Θ| X1 = x1 , , xn ) fΘ| X1 = x1 ,

, X n = xn

(θ (θ

x1 , x1 ,

, xn ) dθ , xn ) dθ

=

Θ

∫ τ (θ ) − t ( x1 ,
, xn ) − τ (θ )

, X n = xn

But the expression in the above is the conditional expectation of

t ( x1 ,

with respect to the posterior distribution of Θ given X 1 = x1 ,

, X n = xn , which is minimized as a function of

t ( x1 ,

, xn ) for t * ( x1 ,
, X n = xn .

, xn ) equal to the conditional median with respect to the posterior distribution of Θ given

X1 = x1 ,

⎡ Recall that E Z − a is minimized as a function of a for a* = median of Z ⎤ ⎣ ⎦
Hence, the Bayes estimator of τ (θ ) with respect to the squared-error loss function is given by the median of the posterior distribution of Θ given X1 = x1 ,

, X n = xn . (Proved)

Example: Let X 1 , X 2 ,

, X n denote a random sample from normal distribution with the density f (x |θ ) =
2⎤ ⎡ 1 exp ⎢ − ( x − θ ) ⎥ 2π ⎣ 2 ⎦

1

;

−∞ ≤ x ≤ ∞

Assume that the prior distribution of Θ is given by

g Θ (θ ) =

⎡ 1 2⎤ exp ⎢ − (θ − µ0 ) ⎥ 2 2π ⎣ ⎦

1

;

−∞ ≤θ ≤ ∞

That is, Θ is standard normal. Write µ0 = x0 when convenient. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function.

Bayes and Minimax Estimation ~ 13 of 20

Solution
We know that the Bayes estimator of τ (θ ) with respect to the squared error loss function is given by

E ⎡τ (θ ) X 1 = x1 , ⎣

, X n = xn ⎤ = ⎦

∫ τ (θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦
i =1

n

∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1

n

We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

(θ | x1 ,

, xn ) =

∏ ⎡ f ( xi θ )⎤ gΘ (θ ) ⎣ ⎦ ∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
n i =1 n

n

⎡ 1 n ⎤ ⎛ 1 ⎞ ( xi − θ )2 ⎥ ⎜ ⎟ exp ⎢ − 2 i =1 ⎢ ⎥ ⎝ 2π ⎠ ⎣ ⎦ = n ∞ n ⎡ 1 ⎤ ⎛ 1 ⎞ ( xi − θ )2 ⎥ ⎜ ⎟ exp ⎢ − 2π ⎠ ⎢ 2 i =1 ⎥ ⎣ ⎦ −∞ ⎝

2⎤ ⎡ 1 exp ⎢ − (θ − µ0 ) ⎥ 2 2π ⎣ ⎦

1

2⎤ ⎡ 1 exp ⎢ − (θ − µ0 ) ⎥ dθ 2π ⎣ 2 ⎦

1

f Θ X1 = x1 ,

, X n = xn

(θ | x1 ,

, xn ) =

2 ⎡ 1 ⎛ ⎞ ⎤ ⎢ xi ⎟ ⎥ ⎜ ⎢ ⎜ θ − i =0 ⎟ ⎥ ⎢ 1⎜ ⎥ 1 n +1 ⎟ ⎥ exp ⎢ − ⎜ ⎟ 1 1 ⎢ 2⎜ ⎟ ⎥ 2π ⎢ ⎜ n +1 n +1 ⎟ ⎥ ⎢ ⎜ ⎟ ⎥ ⎜ ⎟ ⎥ ⎢ ⎝ ⎠ ⎦ ⎣

So, the Bayes estimator of θ with respect to the squared error loss function is given by

E ⎡τ (θ ) | X1 = x1 , ⎣

, X n = xn ⎤ = ⎦

∫τ (θ ) ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1 ∫∏ i =1
∞ n

n

⎡ f ( xi θ ) ⎤ gΘ (θ ) dθ ⎣ ⎦

= τ (θ ) f Θ X1 = x1 ,

, X n = xn

x1 ,

, xn ) dθ

=

−∞

∫θ

2 ⎡ 1 ⎛ ⎞ ⎤ ⎢ xi ⎟ ⎥ ⎜ ⎢ ⎜ ⎟ ⎥ i =0 ⎢ 1 ⎜θ − ⎥ 1 n + 1 ⎟ ⎥ dθ exp ⎢ − ⎜ ⎟ 1 1 ⎢ 2⎜ ⎟ ⎥ 2π ⎢ n +1 n +1 ⎟ ⎥ ⎜ ⎢ ⎜ ⎟ ⎥ ⎢ ⎝ ⎠ ⎥ ⎣ ⎦

θ−
Now, let

∑x
i =0

1

i

n +1 = z 1 n +1

θ=

∑x
i =0

1

i

n +1

+

1 z n +1

dθ =

1 dz n +1

Now, we have that,

⎡ E ⎣τ (θ ) | X1 = x1 ,

⎛ 1 ⎞ xi ⎜ ⎟ 1 ⎜ ⎟ ⎤ z⎟ , X n = xn ⎦ = ⎜ i =0 + n +1 n +1 −∞ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠

1 ⎡ 1 ⎤ 1 dz exp ⎢ − z 2 ⎥ 1 ⎣ 2 ⎦ n +1 2π n +1 ⎡ 1 ⎤ exp ⎢ − z 2 ⎥ dz ⎣ 2 ⎦
Bayes and Minimax Estimation ~ 14 of 20

=

∑ xi
i =0

1

n +1

+

1

n + 1 −∞

∫z

1 2π

=

∑ xi
i =0

1

n +1

+

∞ ⎡0 1 1 ⎡ 1 ⎤ ⎡ 1 ⎤ ⎤ exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥ ⎢ −z n + 1 ⎢ −∞ 2π 2π ⎣ 2 ⎦ ⎣ 2 ⎦ ⎥ 0 ⎣ ⎦

1

=

∑ xi
i =0

1

n +1
1

+

∞ ⎡ ∞ 1 1 ⎡ 1 ⎤ ⎡ 1 ⎤ ⎤ exp ⎢ − z 2 ⎥ dz + z exp ⎢ − z 2 ⎥ dz ⎥ ⎢− z n +1 ⎢ 0 2π 2π ⎣ 2 ⎦ ⎣ 2 ⎦ ⎥ 0 ⎣ ⎦

1

E ⎡τ (θ ) | X 1 = x1 , ⎣

So, here we have that the Bayes estimator of θ with respect to the squared error loss is given by

, X n = xn ⎤ = i = 0 ⎦ n +1
1 1

∑ xi

∑ xi
i =0

1

Since, the posterior distribution of Θ is normal, its mean and median are the same. Hence,

n +1
1

=

x0 + ∑ xi
i =1

n +1
1

=

µ0 + ∑ xi
i =1

n +1
1

∑ xi
i =0

x0 +
=

∑ xi
i =1

n +1

n +1

=

µ0 + ∑ xi
i =1

n +1

is also the Bayes estimator with respect to the absolute-error loss function.

Example: Let X 1 , X 2 ,

, X n denote a random sample from normal distribution with the density

f (x |θ ) =

Assume that the prior distribution of Θ is given by g Θ (θ ) = I ( 0,1) (θ ) That is, Θ is standard uniform. Find the Bayes estimator of τ (θ ) with respect to the squared error loss function

I ( x) θ ( 0, θ )

1

l (t ; θ ) =

( t − θ )2
θ2

.

Solution:

We know that the Bayes estimator of τ (θ ) with respect to any general loss function such as

l (t ; θ )
is that estimator which minimizes
Θ

( t − θ )2 =
θ2
, xn ) ; θ ) f Θ X1 = x1 ,
, X n = xn

∫ l ( t ( x1 ,
n

x1 ,

, xn ) dθ

We know that the posterior distribution of Θ is given by

f Θ X1 = x1 ,

, X n = xn

(θ | x1 ,

, xn ) =

i =1 n

⎡ f ( xi θ ) ⎤ gΘ (θ ) ⎣ ⎦

∫ ∏ ⎡ f ( xi θ )⎤ gΘ (θ ) dθ ⎣ ⎦ i =1
⎛1⎞ ⎜θ ⎟ ⎝ ⎠
n

=

⎛1⎞ ⎜θ ⎟ ⎝ ⎠

n

∏ I(0,θ ) ( xi ) I( 0,1) (θ )
i =1 n i =1

n

⎛1⎞ ⎜θ ⎟ ⎝ ⎠

n

∏ I(0,θ ) ( xi ) I(0,1) (θ )dθ

=

∏ I(0,θ ) ( xi )
i =1 n i =1

n

=

1

⎛1⎞ ⎜θ ⎟ 0⎝ ⎠

n

∏ I(0,θ ) ( xi )dθ
n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) ⎝ ⎠

yn

1

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) dθ ⎝ ⎠
Bayes and Minimax Estimation ~ 15 of 20

n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) =⎝ ⎠ 1 ⎡ θ − n +1 ⎤ ⎢ ⎥ ⎢ − n + 1 ⎥ yn ⎣ ⎦ ⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ 1 ⎡1 − y −( n −1) ⎤ n ⎥ ⎣ ⎦ − ( n − 1) ⎢
n

n

⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ 1 1 ⎡θ −( n −1) ⎤ ⎥ yn ⎣ ⎦ − ( n − 1) ⎢ ⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) = ⎝ ⎠ ⎤ 1 ⎡ 1 ⎢ n − 1⎥ n − 1) ⎢ yn −1 ⎥ ( ⎣ ⎦
n

n

Now the Bayes estimator of τ (θ ) with respect to any general loss function such as

l (t ; θ ) =
is that estimator which minimizes

( t − θ )2
θ2
⎛1⎞ ⎜ θ ⎟ I ( yn ,1) (θ ) ⎝ ⎠ dθ ⎤ 1 ⎡ 1 ⎢ n − 1⎥ ( n − 1) ⎢ yn −1 ⎥ ⎣ ⎦
2 n

Θ

(t − θ )
θ
2

2

f Θ X1 = x1 ,

, X n = xn

x1 ,

, xn ) dθ =

Θ

(t − θ )
θ2

2

=

yn

1

( t ( yn ) − θ )
θ2

⎛1⎞ ⎜θ ⎟ ⎝ ⎠ dθ ⎤ 1 ⎡ 1 ⎢ n − 1⎥ ( n − 1) ⎢ yn −1 ⎥ ⎣ ⎦

n

Or, that estimator which minimizes

yn

1

( t ( yn ) − θ )
θ2

2

⎛1⎞ ⎜ θ ⎟ dθ = ⎝ ⎠

n

yn

1

( t ( yn ) − θ )
θ n+ 2
2 1

2

dθ dθ − 2t ( yn ) ∫
yn 1

= ⎡ t ( yn ) ⎤ ⎣ ⎦
Here, equation

yn

∫ θ n+2

1

θ

dθ + n +1

1

yn

∫ θ n dθ

1

1

( A)

( A)

is a quadratic equation in t ( ⋅) . This quadratic equation assumes its minimum for

t

*

( yn ) =

yn 1

∫ θ n+1 dθ

1

yn

1 ⎡ − 1 − yn n ⎤ ⎦ −n ⎣ = 1 −( n +1) ⎤ ⎡ 1 dθ − ( n + 1) ⎢1 − yn ⎥ ⎣ ⎦ n+2 θ =
n n yn +1 n + 1 yn − 1 × n × n +1 n yn yn − 1

1

t * ( yn ) =

n n + 1 yn − 1 × n +1 × yn n yn − 1

So, the Bayes estimator of τ (θ ) with respect to the squared error loss function

l (t ; θ ) =
is given by

( t − θ )2
θ2

t * ( yn ) =

n n + 1 yn − 1 × n +1 × yn . n yn − 1

Bayes and Minimax Estimation ~ 16 of 20

For two estimators T1 = t1 ( X 1 , X 2 , better estimator than T2 if and only if

, X n ) and T2 = t2 ( X1 , X 2 ,

, X n ) , the estimator T1 is defined to be a

Rt1 (θ ) ≤ Rt2 Rt1 (θ ) < Rt2
An estimator T = t ( X 1 , X 2 ,

for all θ in Θ and for at least one θ in Θ

, X n ) is defined to be admissible if and only if there is no better estimator.

Example: Using the squared error loss function l ( t , θ ) = ( t − θ ) , estimators for the location parameters of a Normal
2

distribution given a sample of size n are the sample mean t1 ( x ) = x , sample median t2 ( x ) = m , the weighted mean t3 ( x ) =

∑ wi xi ; ∑ wi = 1 . Their respective risk functions are
R1 =

σ2
n

,

R2 = 1.57

σ2
n

;

2⎫ ⎧1 R3 = σ 2 ⎨ + ∑ ( wi − w ) ⎬ ⎩n ⎭

Since R1 < R2 or , R1 < R3 . So x is an admissible estimator of the location parameter µ of normal distribution.

An estimator t is said to be inadmissible if there exists another estimator t ′ which dominates it such that

R (θ , t ′ ) ≤ R (θ , t ) R (θ , t ′ ) < R (θ , t ′ )

for all θ in Θ and for some θ in Θ

To find the inadmissibility of an estimator t , we may use the following lemma. Let the range of estimator τ (θ ) be

[ a, b ]

and the loss function L (θ , t ) ≥ 0 and for any fixed θ , L (θ , t ) is

increasing and for any fixed θ , L (θ , t ) is increasing as t moves away from τ (θ ) in either direction. Then any estimator taking on values outside the closed interval [ a, b ] with positive probability is inadmissible.

The properties of admissible estimators are as follows. a) b) If the loss function L is strictly convex, then every admissible estimator must be non-randomized. If L is strictly convex and t is an admissible estimator of τ (θ ) and if t ′ is another estimator with the same risk function i.e. R (θ , t ) = R (θ , t ′ ) for all θ then t = t ′ with probability 1 . c) Any unique Bayes estimator is admissible.

Minimax Estimator
An estimator T is defined to be a minimax estimator if and only if
*

Sup R θ , t * ≤ Sup R (θ , t )
θ θ

(

)

for every estimator t

Bayes and Minimax Estimation ~ 17 of 20

Properties of Minimax Estimator
The properties of minimax estimator are given below. a) One appealing feature of the minimax estimator is that it does not depend on the particular parameterization. b) If g ( ⋅) be a prior distribution of θ such that i. ii.

∫ R (θ , tg ) g (θ ) dθ = sup R (θ , tg ) then θ

t g is minimax
if t g is the unique Bayes solution with respect to g ( ⋅) , it is unique minimax procedure.

c) If a Bayes estimator t g has constant risk then it is minimax. d) If t ′ dominates by a minimax estimator t , then t ′ is also minimax. e) If an estimator has constant risk and is admissible, it is minimax. f) The best equivalent estimator may be frequently minimax.

Example: Suppose that θ = {θ1 , θ 2 } , where θ1 corresponds to oil and θ 2 to no oil. Let A = {a1 , a2 , a3 } where ai
corresponds to the choice i , i = 1, 2, 3 . Suppose that the following table gives the losses for the decision problem. Drill a1 Oil θ1 No oil θ 2 0 12 Sell a2 10 1 Partial a3 5 6

If there is oil and we drill, the loss is zero while if there is no oil and we drill, the loss is 12 and so on. An esperiment is conducted to obtain the information about θ , resulting is the random variable X with possible values coded as 0 and 1 given by

x
0 Oil θ1 No oil θ 2 0.3 0.6 1 0.7 0.4

When there is oil, 0 occurs with probability 0.3 and 1 occurs with probability 0.7

P ( x = 0 | θ1 ) = 0.3

and

P ( x = 1 | θ1 ) = 0.7

Now the possible decision rules δ i ( x ) are.

i

x
0 1

1

2

3

4

5

6

7

8

9

a1
a1 Here,

a1
a2

a1
a3

a2
a1

a2
a2

a2
a3

a3
a1

a3
a2

a3
a3

δ1 = Take action a1 regurdless of the value of X δ2 = ⎨
⎧Take action a1 if X = 0 ⎩Take action a2 if X = 1

and so on.
Bayes and Minimax Estimation ~ 18 of 20

Then the risk of δ at θ is

R (θ , δ ) = E ⎡l (θ , δ ( x ) ) ⎤ ⎣ ⎦

= l (θ , a1 ) P (δ ( x ) = a1 ) + l (θ , a2 ) P (δ ( x ) = a2 ) + l (θ , a3 ) P (δ ( x ) = a3 )

Now,

R (θ1 , δ 2 ) = 0 × 0.3 + 10 × 0.7 = 7 R (θ 2 , δ 2 ) = 12 × 0.6 + 1× 0.4 = 7.06

Thus we get, 1 2 7 7.06 7.06 3 3.5 9.6 9.6 4 3 5.4 5.4 5 10 1 10 5.4 6 6.5 3 6.5 7 1.5 8.4 8.4 8 8.5 4.0 8.5 9 5 6 6

R (θ1 , δ i ) R (θ 2 , δ i ) Max ⎡ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤ ⎣ ⎦

0 12 12

min ⎡ Max ⎡ R (θ1 , δ i ) , R (θ 2 , δ i ) ⎤ ⎤ ⎣ ⎦⎦ ⎣
⎧a1 ⎩a2
x=0 x =1

Thus minimax solution is δ 4 ( x ) = ⎨

if if

Again, R (θ 2 , δ 4 ) = 5.4 < R (θ 2 , δ 2 ) = 7.6 , so δ 2 is inadmissible.

Suppose that in our oil dreling example an expart thinks the chance of finding oil is 0.2, then we treat the parameter as a random variable θ with possible values θ1 , θ 2 and the frequency function is

π (θ1 ) = 0.2, π (θ 2 ) = 0.8
So that Baye’s risk is

R (δ1 ) = 0.2 × 0 + 0.8 × 12 = 9.6

R (δ ) = E ⎡ R (θ , δ ) ⎤ = 0.2 R (θ1 , δ ) + 0.8 R (θ 2 , δ ) ⎣ ⎦ R (δ 2 ) = 0.2 × 7 + 0.8 × 7.6 = 7.46 R (δ 3 ) = 0.2 × 3.5 + 0.8 × 9.6 = 8.38

and so on.

So we compute the following table

i

1 9.6

2 7.48

3 8.38

4 4.92

5 2.8

6 3.7

7 7.02

8 4.9

9 5.8

R (δ i )

In the Bayesian framework δ is preferable to δ ′ if and only if it has smaller Bayes risk. If there is a rule δ * which attains the minimum Bayes risk i.e. such that R δ * = min R (δ ) = 2.8 then it is called a Bayes rule. From this
δ

( )

example we say that rule δ 5 = 2.8 is the unique Bayes rule for our prior distribution.

Bayes and Minimax Estimation ~ 19 of 20

⎧1 1 ⎫ Example: Let X ~ b (1, p ) , p ∈ Θ = ⎨ , ⎬ and A {a1 , a2 } . Let the loss function be defined as follows. ⎩4 2⎭

a1 1 p1 = 4 1 p2 = 2
1 3

a2
4 2

The set of decision rules includes four functions: δ1 , δ 2 , δ 3 , δ 4 , defined by

δ1 ( 0 ) = δ1 (1) = a1 δ 2 ( 0 ) = a1 δ 3 ( 0 ) = a2 δ 2 (1) = a2 δ 3 (1) = a1

δ 4 ( 0 ) = δ 4 (1) = a2
The risk function takes the following values:

i
1 2

R ( p1 , δ i )
1

R ( p2 , δ i )
3

Max ⎡ R ( p, δ i ) ⎤ ⎣ ⎦
p1 , p2

min Max ⎡ R ( p, δ i ) ⎤ ⎣ ⎦
i p1 , p2

3

7 4

5 2

5 2

3 4

13 4
4

5 2
2

13 4
4

5 2

Thus the minimax solution is

δ2 ( x) = ⎨

⎧a1 ⎩a2

if if

x=0 x =1

Bayes and Minimax Estimation ~ 20 of 20

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods
Introduction to SSP and Bootstrap Method
If we know the distribution function F ( ⋅) of a random variable X and wish to evaluate some function of it, say

θ ( F ) , we can proceed in two ways: (1) evaluate θ ( F ) exactly; and (2) simulate to estimate θ ( F ) .
For example: Suppose that X is a random variable which is N ( 0, 1) so that F ( x ) = Φ ( x ) , and that we wish to know the 8 moment of X , so that θ ( F ) =
th

−∞

∫ x Φ ( x ) dx . Then one way we may proceed is to (1) evaluate the
8

integral exactly. If the integral involved in the preceding method is such that no simple way to evaluate it exists, a second way we may proceed is (2) simulate to estimate θ ( F ) . Generating random variables X 1 , X 2 ,

, X n that are N ( 0, 1) and

independent, we can estimate θ ( F ) using the principles of estimation. Thus, we might estimate θ ( F ) by

(X

8 1

8 + X2 +

8 + Xn

)
n

, which is a consistent estimator.

The preceding discussion assumed that we knew the distribution function F ( ⋅) . However, often we have not a known distribution function F ( ⋅) for which we wish to know θ ( F ) , but rather a random sample X 1 , X 2 , drawn from distribution function F ( ⋅) with F ( ⋅) unknown, and wish to estimate θ ( F ) . Now suppose we are taking approach (2) and wish to specify the variance of our estimator. With F ( ⋅) known, we can proceed as follows to solve this problem. Step 1 : Generate X 1 , X 2 ,

, Xn

ˆ , X n and estimate θ ( F ) ; call the estimate θ1 .

Step 2 : Generate X n +1 , X n + 2 ,

, X 2 n and estimate θ ( F ) from these new random variables which are to be

ˆ independent of all random variables previously generated; call the estimate θ 2 .
Step 3 : Generate X 2 n +1 , X 2 n + 2 ,

, X 3n and estimate θ ( F ) from these new random variables which are to be

ˆ independent of all random variables previously generated; call the estimate θ3 .

Step N : Generate X ( N −1) n +1 , X ( N −1) n + 2 ,

, X Nn and estimate θ ( F ) from these new random variables which

ˆ are to be independent of all random variables previously generated; call the estimate θ N .

ˆ ˆ ˆ Then θ1 , θ 2 , θ3 ,

ˆ , θ N are N independent and identically distributed random variables each estimating θ ( F ) .

Their variance may be estimated by

ˆ σ2 =

∑ (θˆi − θ )
N i =1

N −1

;

θ =

θˆ1 + θˆ2 +
N

ˆ + θN

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 1 of 3

And the variance of the estimator θ Procedure SSP (θ , F , n, N ) .

ˆ2 is estimated by σ

N

. We call this procedure the Statistical Simulation

Now, if F ( ⋅) is unknown, the SSP (θ , F , n, N ) cannot be used. However, then one will have a random sample

X1 , X 2 ,

, X n taken from the unknown distribution function F ( ⋅) . Using the random sample, the distribution
, Step N followed. We call this

ˆ function may be estimated, say by some estimator F , and then Step 1, Step 2, ˆ the General Statistical Simulation Procedure SSP θ , F , X 1 ,

(

, Xn, N .

)

ˆ ˆ There are many ways to choose the estimate F of F in the SSP θ , F , X 1 ,

(

, X n , N . One of the very simplest , X n . If that is done, then the

)

ˆ is to take F to be the empiric distribution function based on the sample X 1 , X 2 ,
procedure is called the bootstrap procedure.

Bootstrap Sampling
Bootstrap sampling is a method of selecting a sample of size n with replacement from a set of n data points for a data set X 1 , X 2 ,

, X n . This is equivalent to record the value of each data point into a ping pong ball and placing

them into a box. Select a ping pong ball at random, record its value and replace the ball. We have to repeat this n times. Doing this n times maintains the original sample size of n . With the Bootstrap method the basic sample is treated as the population. Thus the bootstrap estimation procedure consists of following steps. Step 1 : Using the original data set calculate some statistic of interest to estimate the characteristics of population of interest. Call this B0 . Step 2 : Take a Bootstrap sample of size n from the original data set which produces a new data set
* * X1 , X 2 , * , X n . Calculate some statistic of interest to estimate the characteristic of your population of

interest and call this B1 . Step 3 Step 4 Step 5 : : : Then we have to repeat N times the step-2 and we will produce B1 , B2 , Sort B1 , B2 ,

, BN .

, BN from smallest to largest.

We can estimate the bias of our original estimator by B0 − B .

Uses of Bootstrapping Method:
• • • • • • This method is used for computation of sampling distribution of any statistic. This is very good for confidence interval and bias estimation but not for point estimation. This method is able to estimate measures of variability. It is able to calculate power. It can be employed in nonparametric and in parametric inference. When the data size is very small we use bootstrap sampling for increasing the data size.

Example: Estimating the Standard Error of X .
Let θ = E ( X ) and σ 2 = Var ( X ) . Then from a random sample X 1 , X 2 , as X , we find X and it has mean θ and V X = σ

, X n with the same distribution function

( )

2

n

.

The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 2 of 3

The bootstrap method of estimating V X

( )

ˆ is the SSP θ , F , X1 ,

(

, Xn, N

)

ˆ with F taken to be the empiric

distribution function and proceeds as follows: Step 1 : Take a sample of size n (with replacement) form calculate its sample mean X1 . Step 2 : Repeat step 1 independently N − 1 additional times, finding X1 , X 2 ,

{ X1 ,

X2,

, X n } say

{ X11 , X12 ,

, X 1n } and

V ( X ) is

, X N . The bootstrap estimate of

ˆ σ2 =

∑ ( Xi − X⋅ )
i =1

N

N −1

;

X⋅ =

X1 + X 2 + N

+ XN

Example: Estimating Bias
Suppose that, based on a random sample X 1 , X 2 ,

ˆ , X n some quantity θ of interest is estimated by θ . The

ˆ ˆ ˆ estimator θ has some bias b = E θ − θ . To estimate the bias, consider use of SSP θ , F , X1 ,

(

)

(

ˆ , X n , N with F

)

taken to be the empiric distribution function (so we have a bootstrap estimate). Based on N bootstrap samples of

ˆ ˆ size n each, one finds the estimators θ1 , θ 2 , ˆ b = θ −θ .

ˆ θ + θˆ + ˆ , θ N with θ = 1 2 N

ˆ + θN

ˆ and estimates the bias of θ by

Example:
Let X 1 , X 2 ,

, X n be a random sample of size n from a Poisson distribution with unknown mean λ . If the

parameter of interest is θ = P ( X ≤ 1) = e − λ (1 + λ ) , the MLE is e − X 1 + X , which is biased. To reduce the bias, let us investigate the bootstrap method.

(

)

Let X ij

( i = 1,

, N ; j = 1,

, n ) be the N bootstrap samples, that is, samples taken at random with ,N

replacement from

{ X1 ,

X2,

, X n } and for i = 1,

θˆi = e X i (1 + X i ) − (

Number of X i1 , X i 2 , ˆ b =θ

, X in that are ≤ 1) n

Then the bootstrap estimate of the bias of θ is simply

ˆ Then, one might use e − X 1 + X − b to estimate θ .

(

)

Remark:
Note that an approximate 100 (1 − α ) % confidence interval for θ can be constructed using bootstrap methods, as

ˆ ˆ ˆ follows. If θ is the bootstrap estimate of θ and σ 2 its sample variance based on θ1 , θ 2 ,
take the interval

ˆ , θ N for N large we

{θˆ − Φ (1 − α 2 )σˆ ,
−1

ˆ ˆ θ + Φ −1 1 − α 2 σ

(

) }

Note that we use the original estimate of θ (not the bootstrap estimate), and the bootstrap procedure has been used

ˆ to provide us with an estimate of variability for θ .
Note that the exact same details apply to the more general statistical simulation procedure

ˆ SSP θ , F , X1 ,

(

, X n , N , in which the only difference is what estimate of F is being sampled from.
The General Statistical Simulation Procedure (SSP) and Bootstrap Methods ~ 3 of 3

)

Estimation & Confidence Interval

Simultaneous Estimation of Several Parameters
Suppose we have vector of parameters θ = (θ1 , θ 2 , parameters simultaneously. Let fθ ( x1 , x2 ,

, θ k ) ; k − parameters . Our problem is to estimate the

, xn ) = L ( x;θ ) = joint distribution or LE

We have the following regularity conditions: i)

Θ is a non-degenerate open interval in R k .

ii) For all most all x ' s

( all θ ∈ Ω )
δ fθ ( x1 , x2 , δθi
, xn )

( i = 1, 2,
for

, k) =

δ L ( x;θ ) exists δθi
,k

iii)

δ δ L ( x ; θ ) dx = ∫ L ( x ; θ )dx δθi ∫ δθi A A
δ δθi δ ∫ t j L ( x, θ )dx = ∫ t j δθ j L ( x; θ ) dx
A

i = 1, 2,

where, A = domain of positive probability density
iv)

where, t j is the esitmator of θ j

v) The elements of the matrix ∆θ = δ ij (θ ) , where δ ij (θ ) = Eθ ⎢ that ∆θ is positive definite.

(

)

⎡ δ ln L ( x, θ ) δ ln L ( x θ ) ⎤ ⎥ , exist and are such δθi δθ j ⎢ ⎥ ⎣ ⎦

Theorem
In any regular estimation case, the variances and covariance’s

δ ij (θ )

of unbiasedness

estimator

Ti ( i = 1, 2, ..., k ) for θi

( i = 1, 2, ..., k ) respectively satisfy the inequality,
u′

∑θ u ≥ u′∆θ−1u
is an arbitrary vector of real parameters.

where,

∑θ = (σ ij (θ ) ) and u

Proof
Let the same symbol λi (θ ) be used to denote situation, condition (iii) becomes

δ log fθ ( X1 , δθi

, X n ) as well as

δ log fθ ( x1 , δθi

, xn ) . In this

∫ λi (θ ) fθ ( x1 ,
A

, xn ) dx = 0,

(i )

∫ ti λ j (θ ) fθ ( x1 ,
A

⎧θ , xn ) dx = ⎨ i ⎩0

if i − j otherwise

( ii )

since Ti is unbiased for θi .
Estimation & Confidence Interval ~ 1 of 17

Let u1 ,

, uk be real numbers. Since Ti is unbiased for θi ,

∑ uiTi
i =1

k

is unbiased for

∑ uiθi ; i.e.,
i =1

k

∫ ∑ ui ti fθ ( x1 , i =1
A

k

, xn ) dx = ∑ uiθi
i =1

k

( for all θ ∈ Θ )

This gives, on being differentiated with respect to θ j ,

∫ ∑ ui ti λ j (θ ) fθ ( x1 , i =1
A

k

, xn ) dx = u j

because of ( ii )

Because of ( i ) again,

∫ ∑ ui [ti − θi ] λ j (θ ) fθ ( x1 ,
A i =1

k

, xn ) dx = u j

Taking another set of real numbers, c1 , c2 ,

, ck , we have

∫ ∑∑ ui [ti − θi ] c j λ j (θ ) fθ ( x1 , i =1 j =1
A

k

k

, xn ) dx = ∑ c j u j ,
j =1

k

i.e.,

∑∑ ui c j covθ (Ti , λ j (θ ) ) = ∑ c j u j
i =1 j =1 j =1 k

k

k

k

Nothing that the left hand side is the covariance between

∑ uiTi
i =1

k

and

∑ ci λi (θ ) , we have, since
i =1

⎡ ⎢ covθ ⎢ ⎣

k ⎛ k ⎞⎤ ⎜ uiTi , ci λi (θ ) ⎟ ⎥ ≤ Varθ ⎜ ⎟ i =1 ⎝ i =1 ⎠⎥ ⎦

2

⎛ k ⎞ ⎜ uiTi ⎟ Varθ ⎜ ⎟ ⎝ i =1 ⎠

⎛ k ⎞ ⎜ ci λi (θ ) ⎟ ⎜ ⎟ ⎝ i =1 ⎠

⎛ k ⎞ ⎜ ci ui ⎟ ⎜ ⎟ ⎛ k ⎞ ⎝ i =1 ⎠ varθ ⎜ uiTi ⎟ ≥ ⎜ ⎟ k ⎛ ⎞ ⎝ i =1 ⎠ var θ ⎜ ⎜ ci λi (θ ) ⎟ ⎟ ⎝ i =1 ⎠

2

Let us now maximize the right-hand side with respect to the c ' s . Noting that the right hand side remains unchanged if the c ' s are a multiplied by a common number and that the maximizing c ' s must be such that the correlation between

∑ uiTi
i

and

∑ ci λi (θ ) is a maximum (i.e., unity)
i

∑ ui [ti − θi ] = ∑ ci λi (θ )
i =1 i =1

k

k

uj ≡

∑ ui covθ (Ti , λ j (θ ) ) = ∑ ci covθ ( λi (θ ) , λ j (θ ) ) ≡ ∑ ciδ ij (θ )
i =1 i =1 i =1

k

k

k

Hence the maximizing c ' s are such that (in matrix notation)

∆θ c = u ⇒ and Hence
− c = ∆θ 1u − c′u = u ′∆θ 1u

( because of (V ) ) ,
, ⎛ k ⎞ − varθ ⎜ ci λi (θ ) ⎟ = c′∆θ c = u ′∆θ 1u ⎜ ⎟ ⎝ i =1 ⎠

⎛ k ⎞ − varθ ⎜ uiTi ⎟ ≥ u ′∆θ 1u ⎜ ⎟ i =1 ⎝ ⎠ − u ′Σθ u ≥ u ′∆θ 1u

Estimation & Confidence Interval ~ 2 of 17

Problem: Consider the case of a random sample from a normal population whose mean (θ1 ) and variance (θ 2 ) are both
unknown.
2 1 ⎛ ( x −θ1 ) ⎞ ⎟ − ⎜ 2 ⎜ θ2 ⎟ ⎠ e ⎝

Here,

f ( x) =

1 2πθ 2

; −∞ < x < ∞

Now the likelihood function is as follows,

⎛ 1 ⎞ − 2θ ∑ ( xi −θ1 ) L ( x, θ1 ,θ 2 ) = ⎜ ⎟ e 2 ⎜ 2πθ ⎟ 2 ⎠ ⎝ 1 n ln L = − ln 2πθ 2 − ( xi − θ1 )2 2 2θ 2
1

n

2

(i )

Now we differentiate it with respect to θ1 and get,

δ ln L ( x ) 2 =− δθ1 2θ 2
= 1

∑ ( xi − θ1 )( −1)

θ2

∑ ( xi − θ1 )
2

and

⎡ δ ln L ( x ) ⎤ 2 1 1 2 ⎤ E⎢ ⎥ = 2 E ⎡ ( xi − θ1 ) ⎦ = 2 ∑ ⎡ E ( xi − θ1 ) ⎤ ⎣ ⎦ δθ1 ⎦ θ 2 ⎣∑ θ2 ⎣ 1 n = 2 nθ 2 =

Again we differentiate eq ( i ) with respect to θ 2 and we get,

θ2

θ2
2⎤

δ ln L ( x ) n 1 1 =− + 2 2 θ 2 2θ 2 δθ 2
2

∑ ( xi − θ1 )

2

=

1 2θ 2

⎡ 1 ⎢−n + θ2 ⎣

∑ ( xi − θ1 )

⎥ ⎦

⎡ δ ln L ( x ) ⎤ n E⎢ ⎥ = 2 2θ 2 ⎣ δθ 2 ⎦ ⎡ δ ln L ( x ) δ ln L ( x ) ⎤ E⎢ , ⎥=0 δθ 2 ⎦ ⎣ δθ 2
Hence the lower bounds, the variance of unbiased estimator of θ1 and θ 2 are δ respectively. The traditional unbiased estimators for θ1 and θ 2 are X and S
21

(θ ) =

θ2
n

and δ 22 (θ ) =

2 2θ 2 n

2

∑( Xi − X ) =
n −1

2

. Since Varθ ⎡ X ⎤ = 2 while ⎣ ⎦ n

θ

2θ 2 Varθ ⎡ s 2 ⎤ = 2 , the lower bound in the first case is attained but not that in the second. ⎣ ⎦ n −1
Vector of Parameters
Let us assume that a random sample X 1 , X 2 , where the parameter θ = (θ1 , θ 2 , estimate τ1 (θ ) , τ 2 (θ ) ,

, X n of size n form the density f ( x ; θ1, θ 2 , , θ k ) is available,

, θ k ) and parameter space Θ are k − dimensional. We want to simultaneously , r is some function of θ = (θ1 , , θ k ) . Often k = r ,

, τ r (θ ) , where τ j (θ ) , j = 1,

but this need not be the case. An important special case is the estimation of θ = (θ1 , θ 2 , and τ1 (θ ) = θ1 ,

, θ k ) itself; then r = k ,

, τ k (θ ) = θ k . Another important special case is the estimation of τ (θ ) ; then r = 1 . A point

estimator of τ1 (θ ) , estimator of τ j (θ ) .

(

, τ r (θ ) ) is a vector of statistics, say (T1 ,

, Tr ) , where T j = t j ( X1 ,

, X n ) and T j is an

Estimation & Confidence Interval ~ 3 of 17

Unbiased
An estimator

(T1 ,

, Tr ) , where T j = t j ( X1 ,

, X n ) ; j = 1,

, r , is defined to be an unbiased estimator of

(τ1 (θ ) ,

, τ r (θ ) ) if and only if εθ ⎡T j ⎤ = τ j (θ ) for j = 1, ⎣ ⎦

, r and for all θ ∈ Θ .

For single estimator, we consider the variance of estimator as a member of its closeness to real valued function of population parameter. Here, we seek generalization of the notion of variance to r dimensions. Several such generalization have been proposed; we consider here only four of them i) ii) iii) Vector of variances. Linear combination of variances. Ellipsoid of concentration.

iv) Wilks’ generalized variance.

1. Vector of Variances
Let the vector

( varθ [T1 ] , , var [Tr ]) be a measure of the closeness of the estimator (T1 , , Tr ) to (τ1 (θ ) , , τ r (θ ) ) . Its main advantage is that it is very easy and simple. And the disadvantage of such a definition

is that our measure is vector-valued and consequently sometimes difficult to work.

2. Linear Combination of Variances
One way of over come the disadvantages faced in method (1) is used to linear combinations of variances, that is, measure the closeness of the estimator chosen a j ≥ 0 . Both of these (1) and ( 2 ) generalization of variance embody only the variances of the T j , j = 1,

(T1 ,

, Tr ) to (τ1 (θ ) ,

, τ r (θ ) ) with

∑ j =1 a j varθ ⎡T j ⎤ ⎣ ⎦
r

for suitably

, r . But T j (θ )

are likely to be correlated. So, one should incorporate the covariance of T j ' s for measuring the closeness.

3. Ellipsoid of Concentration
Let (T1 , of the

, Tr ) be an unbiased estimator of (τ1 (θ ) ,
covariance matrix of

, τ r (θ ) ) . Let σ ij (θ ) be the ij − th element of the inverse
the

(T1 ,

, Tr ) ,

where

ij − th element of

the

covariance

matrix

is

σ ij (θ ) = covθ ⎡Ti , T j ⎤ . The ellipsoid of concentration of (T1 , ⎣ ⎦
ellipsoid
r r

, Tr ) is defined as the interior and boundary of the

∑∑ σ ij (θ ) ⎡ti − τ i (θ )⎤ ⎡t j − τ j (θ )⎤ = r + 2 ⎣ ⎦⎣ ⎦
i =1 j =1

The ellipsoid of concentration measures how concentrated the distribution of

(T1 ,

, Tr )

(τ1 (θ ) ,
the

, τ r (θ ) ) . The distribution an estimator (T1 ,
of concentration of another

, Tr ) whose ellipsoid of concentration is contained within

ellipsoid

estimator

(T1′,

, Tr′ )

is

more

highly

concentrated

(τ1 (θ ) ,
(T1 ,

, τ r (θ ) ) than is the distribution of (T1′,

, Tr′ ) .

4. Wilks’ Generalized Variance
Let

, Tr ) be an unbiased estimator of (τ1 (θ ) ,

, τ r (θ ) ) . Wilk’s generalized variance of (T1 ,
, Tr ) .

, Tr ) is

defined to be determinant of the covariance matrix of (T1 ,

Estimation & Confidence Interval ~ 4 of 17

Risk Function
Rd (θ ) = Expected loss = E ⎡ w (θ − d ( x ) ) ⎤ ⎣ ⎦ = Smaller should be desired = Smaller the risk better the estimator

Minimax Estimator
If a random variable X as a density function f (θ ; x ) and d ( x ) is some estimate of θ then the risk function is

Rd (θ ) = E ⎡ w (θ , d ( x ) ) ⎤ ⎣ ⎦
A minimax estimator d ( x ) is any estimator which minimize the supremum Sup Rd (θ ) .
θ

Properties of Minimax Estimator
i) If T * = t * ( X 1 , X 2 ,

, X n ) is a Bayes estimator having constant risk. i.e., Rt x (θ ) = constant then

T * = minimax estiamtor .
ii) iii)

′ ′ If tn dominants a minimax estimator tn then tn is also minimax.
*

If an estimator has constant risk and is admissible it is minimax.

i) ii) If the loss function L is strictly convex than every admissible estimator must be normalized. If L is strictly convex and T be an admissible estimator and if t ′ is another estimator with the same risk that if

T ′ is an another estimator with the same risk i.e. R (θ , t ) = R (θ , t ′ ) then t = t ′ with probability 1 .
iii) Any unique Bayes estimator is admissible (here uniqueness mean that any two Bayes estimator on a set N with Rθ ( N ) = θ

∀ Θ ).

Problem: If x1 ,

, xn are n independent Gaussian normal random variable with distribution function N ( µ , θ ) and the

loss function is the squared error. Find the minimax estimator of mean θ .

Solution
Consider a sequence of prior distribution with mean 0 and variance σ 2 . If θ is a prior distribution of

P (θ | x1 ,

, xn ) =

p (θ | x1 , p ( x1 ,

, xn )

, xn )

for n = 1, x = x1

p (θ | x ) =

−∞

( ) ∫ N ( x ; θ , 1) N (θ ; 0, σ ) dθ
N ( x ; θ , 1) N θ ; 0, σ 2
2

=

1

(1 + σ )
2

2πσ 2

⎡ 1 1+σ 2 ⎛ σ2 exp ⎢ − ⎜θ − ⎜ ⎢ 2 σ2 ⎝ 1+ σ 2 ⎣

⎞ x⎟ ⎟ ⎠

2⎤

⎥ ⎥ ⎦

E (θ | x ) = d ( x ) = xσ 2 1+ σ 2
Estimation & Confidence Interval ~ 5 of 17

V (θ | x ) =

σ2 1+σ 2

then

Sup ( V (θ | x ) ) = Sup
σ

σ2 1+ σ 2
⎞ ⎟ ⎟ ⎟ ⎟ ⎠

⎛ ⎜ 1 = Sup ⎜ σ ⎜ 1+ 1 ⎜ ⎝ σ2
and
σ →∞

lim δ ( x ) = lim

σ 2x δ →∞ 1 + σ 2
=1

⎡ ⎤ 1 ⎥ = lim ⎢ x ⎥ δ →∞ ⎢ 1 2 + 1⎥ ⎢ ⎣ σ ⎦
Hence x is a minimax estimator.

Problem: Find the minimax estimator of θ in sampling from the Bernoulli distribution using a squared error loss function. Solution
A Bayes estimator is given by
n−∑ x x (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a (1 − θ )n−∑ x +b−1dθ ∫0 θθ ∑ (1 − θ ) = 1 1 n−∑ x x (1 B ( a, b ) )θ a −1 (1 − θ )b−1 dθ ∫0 θ ∑ x + a −1 (1 − θ )n−∑ x +b−1 dθ ∫0 θ ∑ (1 − θ ) B ( ∑ xi + a + 1, n − ∑ xi + b ) = B ( ∑ xi + a, n − ∑ xi + b ) ∑ xi + a + 1 n − ∑ xi + b × ∑ xi + a + n − ∑ xi + b = ∑ xi + a + 1 + n − ∑ xi + b ∑ xi + a n − ∑ xi + b ∑ xi + a n + a + b = 1
i i

1

i

i

i

i

i

i

a + b + n +1

1

=

∑ xi + a
n+a+b

So, the Bayes estimator with respect to a beta prior distribution with parameters a and b is given by

t * ( x1 , x2 , ⇒ t * ( x1 , x2 , AB

, xn ) =

∑ xi + a = ∑ xi
n+a+b

n+a+b

+

a n+a+b a 1 ⎡ ⎤ ⎢ A = n + a + b and B = n + a + b ⎥ ⎣ ⎦

(i )

, xn ) = A

∑ xi + B
2

Risk of the estimator , ℜt *
AB

(θ ) = E ⎡( A∑ xi + B ) − θ ⎤ ⎣ ⎦
= E⎡ A ⎢ ⎣ = A2 E ⎡ ⎢ ⎣ 2 ⎡ = A E⎢ ⎣
i

{ ( ∑ x − nθ ) + B − θ + nAθ }⎤⎥⎦
( ∑ xi − nθ ) )
2⎤

2

⎥ + ( B − θ + nAθ ) + 2 ( B − θ + nAθ ) A ⎦ 2 2 xi − nθ + ( B − θ + nAθ ) ⎤ ⎥ ⎦ 2⎤ 2 2 2 ⎡ = A n E ⎢ n ( xn − θ ) ⎥ + ( B − θ + nAθ ) ⎣ ⎦
2

{∑ E ( xi ) − nθ }

(∑

= A2 n 2

θ (1 − θ )
n

+ ( B − θ + nAθ )

2

= A2 nθ (1 − θ ) + ( B + θ ( nA − 1) )

2

Estimation & Confidence Interval ~ 6 of 17

= A2 nθ − A2 nθ 2 + B 2 + θ 2 ( nA − 1) + 2 ( nA − 1) Bθ
2

⎡ = θ 2 ⎢( nA − 1) − nA2 ⎤ + θ ⎡ nA2 + 2 ( nA − 1) B ⎤ + B 2 ⎥ ⎣ ⎦ ⎣ ⎦
2

will be constant if

( nA − 1)2 − nA2 = 0

& nA2 + ( 2nA − 1) B = 0

And ⇒

A2 n 2 − n − 2nA + 1 = 0 A= 2n ± 4n 2 − 4 ⋅ 1 n 2 − n
2

(

)

( 2(n − n)

)=
= = =

2n ± 2 n 2⎞ ⎛ 2 2⎜ n − n ⎟ ⎝ ⎠

( )

( n + n )( n − n )
1 n± n 1 n

n± n

(

n ±1

)
1

Again, nA2 + 2 ( nA − 1) − B = 0 ⎛ 1 ⎞ ⎜ ⎟ nA2 n +1 ⎠ B= = ⎝ 2 (1 − nA ) ⎛ n ⎞ 2 ⎜1 − ⎟ ⎜ n +1 ⎟ ⎝ ⎠ 1 = 2 n +1
2

for A = n

(

n +1

)

(

)

Now, ⇒

A=

1 n+a+b

n ⇒ ⇒ ⇒

(

1

n +1

)

=

1 n+a+b

n+a+b = n

(

n +1

)

⎡ ⎢∵ A = ⎢ ⎢ ⎣

n

(

1

⎤ ⎥ n +1 ⎥ ⎥ ⎦

)

a+b = n+ n −n a+b = n B= a n+a+b a 1 = n+a+b n +1

Again

2

(

)

⇒ ⇒ ∴

n + a + b = 2a n + n = 2a

) n ( n + 1) n+ n = = a= 2 ( n + 1) 2 ( n + 1)
n +1 n+a+b

(

(

n +1

)

n 2

So our estimator is

∑ xi + a = ∑ xi + a
n+ n

So, this is the Bayes estimator with constant risk. Hence it is the Minimax estimator.
Estimation & Confidence Interval ~ 7 of 17

Bayesian Confidence Interval Example: Assuming each item coming off a production line either is or is not defective. So, we can call each item a
Bernoulli trial. Assume again the trials are independent with P ⎡defective⎤ = θ for each trial. ⎣ ⎦ If we select n times from the production line and then,
⎧1 if the item i is defective Let , X i = ⎨ ⎩0 if the item i is not defective

The X1, X 2 ,

, X n is a random sample of a random variable x with parameter θ . We know that the conjugate prior

of θ is a beta density with parameters a and b .

For example, for our production line suppose our prior information suggest

E ⎡θ ⎤ = 0.01 ⎣ ⎦

Var ⎡θ ⎤ = 0.0001 ⎣ ⎦

The larger we take Var (θ ) , the less sure we are of our prior of our information. Thus we determine a & b .
a = 0.01 a+b a = 0.0101b

ab

( a + b ) ( a + b + 1)
2

= 0.0001
2

ab = 0.0001 ( 0.0101b + b ) ( 0.0101b + b + 1)

{

}
θ is again a beta

Now, if we observe

∑ X i = ∑ xi from the sample we observe that posterior distribution for distribution with parameters a + ∑ xi & b + n − ∑ xi
a+b+n a+

∑ xi

b+n−

∑ xi

θ (1 − θ )

Thus the Bayes estimator of θ is mean of these posterior distribution i.e.,

θ * = E ⎡θ | ⎣

a+ xi ∑ xi ⎤ = a + ∑ xi + ∑ x − ∑ xi ⎦ b+ = a+ a+b+n =

∑ xi

= Bayes estimator of θ

( ∑ xi + 0.98)
n + 97.02 + 0.98

Bayesian Interval
Given a random sample of a random variable the confidence interval can be evaluated and in a sense we are
100 (1 − α ) % sure that observe confidence interval covers the true unknown parameter value. Very similar

manipulation can be accomplished with the Bayesian approach. Suppose we are given a random sample of a random variable x whose distribution depends on unknown parameter θ . The parameter θ has a prior density
fθ (θ ) . Once the sample values x1, x2 ,
, xn are known, we can compute the posterior distribution fθ | x (θ | x ) which

summarizes all the current information about θ then if c1 < c2 are two constants
p ⎡c1 ≤ θ ≤ c2 | x ⎤ = 1 − α ⎣ ⎦

We are 100 (1 − α ) % sure that ( c1, c2 ) includes θ given the sample values. We will call such an interval ( c1, c2 ) a
100 (1 − α ) % Bayesian interval for θ .
Estimation & Confidence Interval ~ 8 of 17

Approximation confidence interval in large samples:
We know that under certain regularity conditions MLE ' s are asymptotically normal with mean θ and asymptotic variance
2 σ n (θ ) =

1 ⎡ δ ln f ( x,θ ) ⎤ nEθ ⎢ ⎥ δθ ⎢ ⎥ ⎣ ⎦
2

=−

1 ⎡ δ 2 ln f ( x,θ ) ⎤ ⎥ n Eθ ⎢ δθ 2 ⎢ ⎥ ⎣ ⎦
tn − θ may be taken as a pivotal quantity and a σ n (θ )

When such asymptotically normal estimation exists then

100 (1 − α ) % C.I of θ may be taken as approximately

⎡ ⎢Tn + zα 2σ n (θ ) , ⎣

⎤ Tn + z1−α σ n (θ ) ⎥ 2 ⎦

The above method provides a large sample confidence interval so long as

( tn − θ ) σ n (θ )

can be inverted.

Example: Let X1,

, X n be a random sample drawn from N 0, σ 2 . Here, θ = σ 2 . Find central C.I for σ 2 with an

(

)

approximate confidence coefficient 1 − α .

Solution:
The probability density function,
f x ; 0, σ 2 =
n

(

)

1 2πσ 2

1⎛ x ⎞ − ⎜ ⎟ e 2⎝ σ ⎠

2

The likelihood function is given by, Now taking ln in both sides

⎛ 1 L=⎜ ⎜ 2πσ 2 ⎝

⎞ − 1 ∑ 2i ⎟ e 2 σ ⎟ ⎠

x2

⎛ 1 ⎞ −2 θ =⎜ ⎟ e ⎝ 2πθ ⎠

n

1 ∑ xi

2

1 n n ln L = − ln 2π − ln θ − 2 2 2 Now,

∑ xi2
θ

δ ln L =0 δθ
⇒ ⇒ Again, −

xi2 n1 1 + =0 2 θ 2 θ2

θˆ =

∑ xi2
n

xi n δ 2 ln L = 2 −2 2 δθ 2θ 2θ 3

2

⎡ δ 2 ln L ⎤ n nθ E⎢ ⎥ = 2 −2 3 2 2θ ⎢ δθ ⎥ 2θ ⎣ ⎦ ∴ ⎡ δ 2 ln L ⎤ n −E⎢ ⎥= 2 2 ⎢ δθ ⎥ 2θ ⎣ ⎦

=−

n 2θ 2

We have,
2 σn =

1 ⎡ δ ln L ⎤ −E ⎢ 2 ⎥ ⎢ δθ ⎥ ⎣ ⎦
2

=

2θ 2 n

Estimation & Confidence Interval ~ 9 of 17

∴ 100 (1 − α ) % Confidence interval for σ 2 is given by, ⎡ ⎢ ⎢ ⎣ ⇒ ⎡ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎣ ⎡ ⎢ ⎢ ⎣

∑ xi2 + zα
n n
2

2

2σ 4 , n 2 θ, n 2 n

∑ xi2 + z
n

1−α

∑ xi2 + zα ∑ xi2 + zα
n n ⎪ ⎩

∑ xi2 + z
n n n

1−α

2

2σ 4 ⎤ ⎥ n ⎥ 2 ⎦ ⎤ 2 ⎥ θ n ⎥ ⎦ 2 n

⎡∵ θ = σ 2 ⎤ ⎣ ⎦ ⎡ ⎢∵ θ = ⎢ ⎣

∑ xi2 ,
2⎫ ⎪ ⎬, n⎪ ⎭

∑ xi2 + z
⎪ ∑ xi2 ⎧1 + z ⎨ n ⎪ ⎩

2

1−α

∑ xi2 ⎤ ⎥
n ⎥ ⎦

∑ xi2 ⎤ ⎥
n ⎥ ⎦

2

⎪ ∑ xi2 ⎧1 + zα ⎨

2

1−α

2

2 ⎫⎤ ⎪⎥ ⎬ n ⎪⎥ ⎭⎦

⎡ xi2 ⎢ ⎢ n , ⎢⎡ 2⎤ ⎢ ⎢1 + z ⎥ 1−α ⎢⎢ 2 n⎥ ⎦ ⎣⎣

⎤ ⎥ ⎥ n ⎥ ⎡ 2 ⎤⎥ ⎢1 + zα ⎥⎥ 2 n⎥ ⎢ ⎣ ⎦⎦

∑ xi2

This C.I is not invariant under transformation of parameters. Thus if we take square roots of C.I then that will not get the C.I of σ .

Now we will consider construction of Large Sample Confidence Intervals which are invariant under transformation of parameter.

Suppose that p.d . f

f ( x;θ ) is such that

⎡ δ ln f ( x;θ ) ⎤ E⎢ ⎥=0 δθ ⎢ ⎥ ⎣ ⎦
Let X 1 ,

&

⎡ δ 2 ln f ( x;θ ) ⎤ ⎡ δ ln f ( x;θ ) ⎤ = −E ⎢ V⎢ ⎥ = k 2 ( say ) < ∞ ⎥ δθ δθ 2 ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦

, X n be a random sample of size n drawn from f ( x;θ ) and L =

∏ f ( xi ; θ ) . Clearly then, each of the
i =1

n

random variables

∂ ln ( X i ; θ ) ∂θ

( i = 1,

, n ) has mean zero and variance K 2 . Therefore, by the central limit

theorem, their sample mean

⎛ k2 1 δ ln L ~ N ⎜ 0, ⎜ n n δθ ⎝

⎞ ⎟ ⎟ ⎠

i.e.,

δ ln L δθ ~ N ( 0,1) ⎧ ⎛ δ 2 ln L ⎞ ⎫ ⎪ ⎪ ⎨− E ⎜ 2 ⎟⎬ ⎜ ⎟ ⎪ ⎝ δθ ⎠ ⎪ ⎩ ⎭

(i )

Using this property one can get a large sample C.I for θ . Note that the maximum likelihood estimate of θ has not been used here. Let, Φ (θ ) be the strictly increasing function of θ . Now,

δ ln L δ ln L δ Φ = δθ δ Φ δθ
⎡ δ 2 ln L ⎤ ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎛ δ ln L ⎞ δ 2 Φ E⎢ ⎥ = E⎢ ⎟ 2 ⎟ +⎜ 2 2 ⎥⎜ ⎢ δθ ⎥ ⎢ δ Φ ⎥ ⎝ δθ ⎠ ⎝ δ Φ ⎠ δθ ⎣ ⎦ ⎣ ⎦ Hence, ⎡ δ 2 ln L ⎤ ⎛ δ Φ ⎞ 2 ⎡ δ 2 ln L ⎤ = E⎢ ⎟ E⎢ 2 ⎥ ⎜ 2 ⎥ ⎢ δθ ⎥ ⎝ δθ ⎠ ⎢ δ Φ ⎥ ⎣ ⎦ ⎣ ⎦ ⎡ ⎢ Since, ⎣

Therefore, if (θ1 , θ 2 ) is a C.I for θ then {Φ (θ1 ) , Φ (θ 2 )} is C.I for Φ (θ ) .

δ ln L ⎤ = 0⎥ δΦ ⎦

Estimation & Confidence Interval ~ 10 of 17

Example: Let X1 ,

, X n be a random sample from a N 0, σ 2 population. Here

(

)

n ∑ xi ∂ ln L =− + ∂σ σ σ3

2

and

⎛ ∂ 2 ln L ⎞ 2n E⎜ ⎜ ∂σ 2 ⎟ = − σ 2 ⎟ ⎝ ⎠

∑ X i2
Hence

σ2

−n

2n

~ N ( 0, 1)

A central 100 (1 − α ) % confidence interval for σ is, therefore,
1 ⎧ ⎞ 2 ⎪⎛ X i2 ⎟ ⎪⎜ , ⎨⎜ ⎟ ⎪⎜ n + zα 2n ⎟ 2 ⎠ ⎪⎝ ⎩

⎛ ⎞ X i2 ⎟ ⎜ ⎜n−z 2n ⎟ ⎜ ⎟ α 2 ⎝ ⎠

1

2⎫

⎪ ⎪ ⎬ ⎪ ⎪ ⎭

(1)

If the variance σ 2 is treated as parameter, then the method yields 100 (1 − α ) % confidence interval for σ 2 as

⎧ X i2 ⎪ ⎪ n , ⎨ ⎪1 + zα 2 n 2 ⎪ ⎩

∑ X i2
1 − zα
2

⎫ ⎪ n ⎪ ⎬ 2 ⎪ n⎪ ⎭

It may be noted that the large sample confidence intervals based on maximum likelihood estimators will be shorter on an average that the large sample confidence intervals based on any other estimator.

Confidence Belt
Let T be a statistic whose distribution depends on θ , preferably a sufficient statistic for θ . For each θ , let us determine the values t1 (θ ) and t2 (θ ) such that

P ⎡T < t1 (θ ) ⎤ = α1 θ ⎣ ⎦
Where, α1 + α 2 = α . Supposing Θ is a non-degenerate real interval, by varying

and

P ⎡T > t2 (θ ) ⎤ = α 2 θ ⎣ ⎦

θ we shall get two curves from t1 (θ )
and t2 (θ ) . The first curve, C1 , has the equation t = t1 (θ ) and the second, C , the equation t = t2 (θ ) . Let the two curves be as in the following figure, so that any line drawn perpendicular to the

t − axis intersects both the curves.
Let us denote the ordinate of the point of intersection of this line with C1 by

θ 2 ( t ) and that of the point of intersection of the line with C2 by θ1 ( t ) , so that θ1 ( t ) < θ 2 ( t ) .
Consider now the two random variables

θ1 ( t ) and θ 2 ( t ) ,

which

are

so

defined

that

for

T = t , θ1 (T ) = θ1 ( t ) and θ 2 (T ) = θ 2 ( t ) . From the way θ1 (T ) and θ 2 ( t ) have been obtained, it is obvious that

θ1 (T ) ≤ θ ≤ θ 2 (T ) iff t1 (θ ) ≤ T ≤ t2 (θ ) .
Estimation & Confidence Interval ~ 11 of 17

As such,

P ⎡θ1 (T ) ≤ θ ≤ θ 2 (T ) ⎤ = P ⎡t1 (θ ) ≤ T ≤ t2 (θ ) ⎤ = 1 − α θ ⎣ ⎦ θ⎣ ⎦

∀ θ ∈Θ

Hence given a set of observations X , if t denotes the corresponding value of T , then θ1 ( t ) and θ 2 ( t ) are a pair of confidence limits to θ with confidence coefficient 1 − α . The region in the (T , θ ) - plane which is bounded by the two curves C1 and C2 is called a confidence belt for θ corresponding to the confidence coefficient 1 − α .

Example:

Suppose

( X i , Yi ) ,

i = 1, 2,

, 20

are

a

random

sample

drawn

from

bivariate

normal

with

2 2 BIV µ x , µ y , σ x , σ y , ρ where the all parameters are not known. We want to set confidence limits for ρ .

(

)

Solution
We know, sample correlation coefficient,

r=

∑ ( X i − X )(Yi − Y ) 2 2 ∑ ( X i − X ) (Yi − Y )

The distribution of r depends on ρ only. Let α = 0.05 . From the tables of the correlation coefficient by F.N David et al., we may obtain, for each ρ , the values r1 ( ρ ) and r2 ( ρ ) of r such that

P ⎡ r < r1 ( ρ ) ⎤ = P ⎡ r > r2 ( ρ ) ⎤ = 0.025. θ ⎣ ⎦ θ⎣ ⎦
These are shown in the following table for the values of ρ from −0.9 to 0.9 at intervals of 0.1 . Table: Values of r1 ( ρ ) and r2 ( ρ ) for n = 20

ρ
−0.9
−0.8 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1

r1 ( ρ )

r2 ( ρ )

ρ
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

r1 ( ρ )

r2 ( ρ )

−0.97065
−0.92223 −0.92289 −0.83500 −0.78095 −0.72585 −0.71366 −0.59586 −0.52565

−0.77222
−0.56661 −0.38984 −0.22886 −0.08607 0.04226 0.15830 0.26394 0.35862

−0.44486
−0.35862 −0.26394 −0.15880 −0.04226 0.08607 0.22886 0.38984 0.56661 0.77222

0.44486
0.52565 0.59586 0.71366 0.72585 0.78095 0.83500 0.92289 0.92223 0.97065

Now, given the observed value of r for a particular random sample of size 20 from the bivariate normal distribution, we can obtain the confidence limits to ρ with confidence coefficient 1 − 2 × 0.025 = 0.95 . Suppose, e.g., the observed value or r is 0.55 . Treating 0.55 as a value of r2 ( ρ ) , we find, by inverse interpolation, the corresponding value of ρ to be 0.135 . Similarly, treating 0.55 as a value of r1 ( ρ ) , the corresponding value of ρ is found to be 0.791 . Hence for this value of r , the 95% confidence limits to ρ are 0.135 and 0.791.
Estimation & Confidence Interval ~ 12 of 17

Shortest Confidence Intervals
Suppose we have two statistics T1 & T2 of

P ⎡T < t1 (θ ) ⎤ = α1 ⎣ ⎦
in such a way that α1 = α 2 = α

and

P ⎡T > t2 (θ ) ⎤ = α 2 ⎣ ⎦

2

. However, it is clear that α1 & α 2 may be chosen in infinitely many ways, each

satisfying the conditions α i ≥ 0 and α1 + α 2 = α . Let us consider a particular function

ψ (T , θ ) =

n ( X −θ )

σ

If α not fixed then we can get many confidence interval. But α is fixed then we have many confidence interval.

So we need some criterion which may make a choice among these infinite set of confidence. And obvious method of selecting one out of the possible confidence interval is based on the width of the interval.

Let us suppose that T1 & T2 are two values such that

P ⎡T1 ≤ τ (θ ) ≤ T2 ⎤ = 1 − α ⎣ ⎦

(i )

Then the confidence interval given by T1 & T2 will be said to be better than that of the interval given by T1′ & T2′ which satisfy if

T2 − T1 ≤ T2′ − T1′

∀ θ ∈Θ

( ii )

If equation ( ii ) holds for every other pair of statistics T1′ & T2′ satisfying P ⎡T1 ≤ γ (θ ) ≤ T2 ⎤ = 1 − α for all θ ∈ Θ θ ⎣ ⎦ then the confidence interval given by T1 & T2 will be called uniformly shortest confidence interval for τ (θ ) based on the statistic T .

Example: Consider X ~ N θ , σ 2 where σ 2 is known. Find the shortest confidence interval for θ . Solution
We have,

(

)

⎡ P ⎢τ1−α1 ≤ ⎢ ⎣ ⇒

n ( X −θ )

σ

⎤ ≤ τα2 ⎥ = 1 − α ⎥ ⎦

σ σ ⎤ ⎡ ≤ θ ≤ τ1−α1 P ⎢ X − τα2 ⎥ = 1−α n n⎦ ⎣

The length of the corresponding confidence interval,

L=

σ

⎡τ α + τ1−α ⎤ 1 ⎦ n⎣ 2

So, we have to minimize L i.e., minimize τ α 2 + τ 1−α1 subject to the condition α1 ≥ 0 and α1 + α 2 = α .

Due to symmetry of the distribution of

n ( X − θ ) σ about zero, the difference will be minimum when
i.e.

τ1−α1 = −τ α 2

α1 = α 2 = α 2

Hence the interval is in fact the shortest confidence interval based on the distribution of X .
Estimation & Confidence Interval ~ 13 of 17

In some situation the length of he confidence interval may involve some function of sample observations, e.g., when, under the normal set-up the confidence interval for µ is obtained from the t − distribution for the statistic

n ( X − µ ) S or when, under the same set-up, the confidence interval for σ 2 is obtained from the

χ 2 − distribution for

∑( Xi − X )

2

σ 2 . Here in order to make choice among all possible confidence intervals with

same confidence coefficient, we may make use of the average or expected length of the confidence interval. For the statistics T1 and T2 expected length is,

Eθ (T2 − T1 )
The interval for which these expected length is minimum may be called the interval with shortest expected length or shortest average length.

Example: Let X 1 , X 2 ,

, X n is a random sample draw from N µ , σ 2

(

) here both µ

and σ 2 are unknown. We have

to estimate shortest confidence interval for µ .

Solution
We have,

⎡ P ⎢t(1−α1 ), n −1 ≤ ⎢ ⎣ ⇒ ⎡ P ⎢ X − tα 2 ,n −1 ⎣

⎤ ≤ tα 2 , n −1 ⎥ = 1 − α S ⎥ ⎦ S S ⎤ ≤ µ ≤ X − t1−α1 ,n −1 ⎥ = 1−α n n⎦

n (X − µ)

The expected length of the confidence interval is,

( tα ,n−1 − t1−α ,n−1 )
2 1

Eθ ( S ) n

= kσ ⎡tα 2 ,n −1 − t1−α1 ,n −1 ⎤ ⎣ ⎦

where k is constant that depends on n alone. So we have to minimize ⎡tα 2 , n −1 − t1−α1 , n −1 ⎤ subject to the condition ⎣ ⎦

α1 ≥ 0, α 2 ≥ 0 and α1 + α 2 = α .
Due to symmetric of the t − distribution around zero, the difference tα 2 , n −1 − t1−α1 , n −1 will be minimum if

t1−α1 ,n −1 = −tα 2 , n −1 , i.e. when α1 = α 2 = α

2

.

Example: Let X ~ N µ , σ 2 where µ is known. Find the confidence interval for σ 2 . Solution
The inequalities and P ⎡ θ ⎣ P ⎡ θ ⎣

(

)

∑ ( xi − µ ) 2 ∑ ( xi − µ )

2

⎦ ⎤ =α θ > χα 2 ,n 2 ⎦
2 2

θ < χ12−α1 ,n ⎤ = α1 ⎫
⎪ ⎬ ⎪ ⎭

here θ = σ 2

⎡ ( X − µ )2 ∑ i ≤θ ≤ P ⎢ θ 2 ⎢ χα 2 ,n ⎣

∑( Xi − µ )
χ12−α1 ,n

⎤ ⎥ = 1−α ⎥ ⎦

The corresponding confidence interval has the length,

∑( Xi − µ )
which has the expected value

2

⎡ 1 1 ⎤ ⎢ 2 − 2 ⎥ ⎢ χ1−α1 ,n χα 2 ,n ⎥ ⎣ ⎦

⎡ 1 1 ⎤ − 2 ⎥ nθ ⎢ 2 ⎢ χ1−α1 , n χα 2 , n ⎥ ⎣ ⎦
Estimation & Confidence Interval ~ 14 of 17

The minimization of this expected length amounts to minimization of ⎢
2 χ2

1

2 ⎢ χ1−α1 ,n ⎣

1 ⎤ ⎥ subject to the condition, χα 2 ,n ⎥ ⎦
2

χ12

2 2 ∫ f ( χ ) d ( χ ) = 1−α

2 2 2 2 where, χ1 = χ1−α1 , n , χ 2 = χα 2 , n and f is the p.d.f of the χ 2 − distribution with n degrees of freedom.

Using Lagrange’s method of undetermined multipliers, which involves the partial differentiation of

⎡ χ 22 ⎤ − 2 + λ ⎢ f χ 2 d χ 2 − (1 − α ) ⎥ ⎢ 2 ⎥ χ12 χ 2 ⎣ χ1 ⎦ 1 1

∫ ( ) ( )

with respect to χ1 and χ 2 , we get the minimizing equation as
2 2

1

χ14

+ λ f χ12 = 0

( )

and

1
4 χ2

2 + λ f χ2 = 0

( )

⇒ 1 + χ14 λ f χ12 = 0

( )

(1)
4

4 2 1 + χ2 λ f χ2 = 0

( )

( 2)

Now from equation (1) and ( 2 ) we can write, χ1 f χ1 = χ 2 f χ 2
2 4
2 χ2

( )

( ) is satisfied, besides the equation,
2

χ12

2 2 ∫ f (χ ) d ( χ ) = 1−α

2 2 2 The actual determination of the values χ1 and χ 2 will, of course, by pretty difficult. In practice, one takes χ1 and 2 χ 2 such that α1 = α 2 =

α
2

. But this may make the average length too big.

2 2 For example, if n = 10, α = 0.05, α1 = α 2 = 0.025 then χ1 = 3.247 and χ 2 = 20.483 . So, average length of the

interval is,

1 ⎤ ⎡ 1 − = 3.0318 θ 10 θ ⎢ 3.247 20.483 ⎥ ⎣ ⎦
2 On the other hand, if we take α1 = 0.05 , α 2 = 0 then χ1 = 3.940 and 2 χ 2 = ∞ then the average length of the

interval is ,

⎡ 1 ⎤ − 0 ⎥ = 2.58 θ 10 θ ⎢ ⎣ 3.940 ⎦
Thus this second procedure, where the confidence interval will be of the form ⎢ 0,

⎡ ⎢ ⎣

v ⎤ ⎥ , where v = χ12 ⎥ ⎦

∑ ( xi − µ )2 ,
i =1

n

would seem to be preferable to this procedure. Thus, this interval is, in fact, not only shorter on the average, but shorter in every case.

Case of Discrete Random Variable
The case of discrete random variables requires to be separately dealt with, for if we want to apply one of the previous procedures, we immediately face a difficulty. In this case we cannot hope to get for each α ( 0 < α < 1) a confidence interval that will have confidence coefficient exactly equal to 1 − α .
Estimation & Confidence Interval ~ 15 of 17

One way of avoiding this problem is to require only that the confidence coefficient be at least 1 − α . Then the statistics T1 and T2 will provide confidence limits to a parametric function γ (θ ) if

P ⎡T ≤ γ (θ ) ≤ T2 ⎤ ≥ 1 − α θ ⎣ ⎦

for all θ ∈ Θ

The actual determination of the confidence intervals may be carried out by drawing confidence belts.

Example: Let X 1 , X 2 ,

, X10 be a random sample from a (point binomial) distribution with p.m. f .

⎧θ x (1 − θ )1− x ⎪ fθ ( x ) = ⎨ ⎪0 ⎩

if

x = 0, 1

where 0 ≤ θ ≤ 1.

oterwise

For obtaining confidence limits to θ with a confidence coefficient at least equal to 0.90 , we may first determine, for a suitable set of values of θ , the values t1 (θ ) and t2 (θ ) of the sufficient statistic T =

∑ Xi
i

such that

P ⎡T < t1 (θ ) ⎤ ≤ 0.05 θ ⎣ ⎦

and

P ⎡T > t2 (θ ) ⎤ ≤ 0.05 θ ⎣ ⎦

the inequalities for P being made as near to equalities as possible. θ For values of θ from 0.1 to 0.9 (taken at intervals of 0.1 ), these numbers t1 (θ ) and t2 (θ ) are as shown in the table bellow:

θ
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

t1 (θ )
0 0 1 2 2 3 5 6 7

t2 (θ )
3 4 5 7 8 8 9 10 10

If we draw t1 (θ ) and t2 (θ ) for different values of θ we can get a confidence belt of θ .

P ⎡t1 (θ ) ≤ T ≤ t2 (θ ) ⎤ ≥ 0.90 θ ⎣ ⎦
the confidence coefficient belt can be improved and the confidence coefficient made closer to 0.9 if n is make large and if at the same time we tabulate t1 (θ ) and

t2 (θ ) at finer intervals of θ .

Theory of Confidence Set
In this context, we are interested in a set of the parameter space Θ , determined in the light of the observations X , that may be supposed to cover the true value(s) of the parameter(s) and that is why a concept of confidence set rather than confidence intervals. Let S be a set of parameter space Θ , then we shall write ' S c θ ' to mean that this set covers or includes θ , so that S c θ ⇔ θ ∈ S .

Definition
A family of sets S ( X ) , for varying x ∈ ℑ , of the parameter space Θ is said to be a family of confidence sets at the level 1 − α (or with the confidence coefficient 1 − α ) if

P ⎡ S ( X1 , X 2 , θ ⎣

, Xn ) c θ ⎤ = 1−α ⎦

for all θ ε Θ

Estimation & Confidence Interval ~ 16 of 17

Definition
A family of sets S0 ( Χ ) , for varying x ∈ ℑ , of the parameter space Θ is said to constitute a family of uniformly more accurate (or most selective or smallest) confidence sets if

P ⎡ S0 ( X 1 , X 2 , θ ⎣ and P ⎡ Sθ ( X 1 , X 2 , θ ⎣

, Xn ) c θ ⎤ = 1−α ⎦ , X n ) c θ ⎤ ≤ P ⎡ S ( X1 , X 2 , ⎦ θ⎣

for all θ ∈ Θ , X n ) c Θ⎤ ⎦ for all θ , θ ′∈ Θ (θ ≠ θ ′ )

(i ) ( ii )

whatever, the other family of sets satisfies ( i ) and

( ii ) . The implication of equation ( ii )

is that it has a smaller

probability of including a wrong value or set of values of the parameter θ then any other family of sets at the same level. In this sense S0 ( x ) is the smallest confidence set of level α corresponding to the set of observation x . In most cases a family of UMA sets cannot be obtained. Hence we introduce the concept of unbiasedness.

Definition
A family of sets S ( Χ ) for different values of x ∈ ℑ of the Θ is said to constitute a family of unbiasedness confidence sets of level 1 − α if

P ⎡ S ( X1 , X 2 , θ ⎣

, X n ) c θ ⎤ ≤1 − α ⎦

for all

θ , θ ′ ∈ Θ, θ ≠ θ ′
, X n ) to cover θ when some

Hence S ( Χ ) is a family of unbiased sets iff the probability for S ( X 1 , X 2 ,

alternative value θ ′ is true does not exceed the same probability for the case when θ itself is true. Surely this is a desirable feature of a family of confidence sets.

Uniformly Most Accurate Unbiased Set (UMAU)
A family of sets S0 ( Χ ) , for varying x ∈ ℑ of the parameter space Θ is said to constitute a family of uniformly most accurate unbiased (UMAU ) confidence sets of level 1 − α if

P ⎡ S0 ( X 1 , X 2 , θ ⎣ P ′ ⎡ S0 ( X 1 , X 2 , θ ⎣ and P ′ ⎡ S0 ( X 1 , X 2 , θ ⎣

, Xn ) c θ ⎤ = 1−α ⎦ , X n ) c θ ⎤ ≤ 1− α ⎦ , X n ) c θ ⎤ ≤ P ′ ⎡ S ( X1 , X 2 , ⎦ θ ⎣ , Xn ) c θ ⎤ ⎦

for all θ ∈ Θ, for all θ , θ ′∈Θ for all θ , θ ′∈Θ

(θ ≠ θ ′ ) (θ ≠ θ ′ )

Estimation & Confidence Interval ~ 17 of 17

Hypothesis-I

Most Powerful Test
The critical region w is the most powerful critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 if

P ( x ∈ w H 0 ) = L0 dx = α

...

... ...

... ...

(1) ( 2)

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ...

w

for every critical region w1 satisfying (1) . The test based on the most powerful critical region is called most powerful test of level α .

Uniformly Most Powerful (UMP) Test
The region w is called uniformly most powerful (UMP) critical region of size α for testing H 0 : θ = θ 0 against

H1 : θ ≠ θ 0 i.e. against H1 : θ = θ1 ≠ θ 0 if

P ( x ∈ w H 0 ) = L0 dx = α

...

... ...

... ...

(1) ( 2)

and P ( x ∈ w H1 ) ≥ P ( x ∈ w1 H1 ) ...

w

for all θ ≠ θ 0 whatever the region w1 satisfying (1) . The test based on the uniformly most powerful critical region is called uniformly most powerful test of level α .

Unbiased Test and Unbiased Critical Region
Let us consider the testing of H 0 : θ = θ 0 against H1 : θ = θ1 . The critical region w and consequently the test based on it is said to be unbiased if the power of the test exceeds the size of the critical region i.e.

Power of the test ≥ Size of the C.R ⇒ ⇒ ⇒ 1− β ≥ α P 1 ( w) ≥ P 0 ( w) θ θ P [ x : x ∈ w | H1 ] ≥ P [ x : x ∈ w | H 0 ] P 1 ( w ) ≥ P 0 ( w) ; ∀ θ ( ≠ θ0 ) ∈ Ω . θ θ

In other words, the critical region w is said to be unbiased if

Uniformly Most Powerful Unbiased (UMPU) Test
Let φ be an unbiased test (or w a critical region) of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ; θ1 ∈ Ω , i.e. i) ii)

E {φ ( x ) | θ 0 } = P ( x ∈ w | θ0 ) = α E {φ ( x ) | θ1} ≥ E {φ ( x ) | θ 0 }
*

; ∀ θ1 ∈ Ω

Suppose that for every other test φ satisfying the conditions (1) and ( 2 ) we have

E {φ ( x | θ1 )} ≥ E φ * ( x | θ1 )

{

}

;

∀ θ1 ∈ Ω

then φ is a uniformly most powerful unbiased (UMPU) test of size α . Hypothesis-I ~ 1 of 11

UMPU Type A1 Test
Let φ be an unbiased test (or w a critical region) of sizw α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 ;

θ1 ∈ Ω , i.e.
i) ii) iii)

E {φ ( x ) θ0 } = P ( x ∈ w θ0 ) = α
E {φ ( x ) θ1} ≥ E {φ ( x ) θ 0 } ; ∀ θ1 ∈ Ω

δ E {φ ( x ) θ1} =0 θ1 =θ0 δθ1

Then φ is called UMPU type A1 test. For a UMPU test it is not required that power curve should have a regular minimum at θ 0 but this is often the name UMPU test is used to imply type A1 test.

Show that 1 − β ≥ α .
Let w be a BCR of size α for testing a simple H 0 against a simple H1 . Then by definition we have,

P ( x ∈ w H 0 ) = ∫ L ( x H 0 ) dx = α
w

By Neyman-Pearson lemma, we have,

L ( x H0 ) L ( x H1 ) L ( x H0 ) L ( x H1 )

≤K ≥K

if

x ∈ w ...

...

...

(i )

and
From ( i ) we have,

if

x ∈ ( S − w ) ... .... ... ( ii )

K .L ( x H1 ) ≥ L ( x H 0 )

⇒ ⇒
Again from ( ii ) we have,

K

∫ L ( x H1 ) dx ≥∫ L ( x H 0 ) dx
w w

K (1 − β ) ≥ α

( iii )

K .L ( x H1 ) ≤ L ( x H 0 ) ⇒ ⇒ K
S −w

∫ L ( x H1 ) dx ≤ ∫ L ( x H 0 ) dx
S −w

K β ≤ (1 − α )

( iv )

From ( iii ) and ( iv ) we have,

K (1 − α )(1 − β ) ≥ K αβ ⇒ 1− β ≥ α

( Proved )
H 0 : µ = µ0
against

Example: Let

x1 , x2 , ..., xn be a random sample darwn from N ( µ ,1) . For testing

H1 : µ ≠ µ1 ≠ µ0 , show that for α1 = α 2 UMPU test exists.

Solution
Since x1 , x2 , ..., xn are drawn from N ( µ ,1) , we have,

⎛ 1 ⎞ − 2 ∑ ( xi − µ0 ) L ( x H0 ) = ⎜ ⎟ e ⎝ 2π ⎠
1

n

2

and

⎛ 1 ⎞ − 2 ∑ ( xi − µ1 ) L ( x H1 ) = ⎜ ⎟ e ⎝ 2π ⎠
1

n

2

Hypothesis-I ~ 2 of 11

According to Neyman-Pearson lemma, we have the BCR is given by

L ( x H0 ) L ( x H1 )

≤K

⇒ ⇒ ⇒ ⇒ ⇒ ⇒ ⇒
IF µ1 > µ0 then

2 2 ⎤ ⎡ n exp ⎢ − ( x − µ0 ) − ( x − µ1 ) ⎥ ≤ K ⎣ 2 ⎦ ⎡ n 2 ⎤ 2 exp ⎢ − x − 2 x µ0 + µ0 − x 2 + 2 x µ1 − µ12 ⎥ ≤ K ⎣ 2 ⎦ ⎡ n ⎤ 2 exp ⎢ − 2 x ( µ1 − µ0 ) + µ0 − µ12 ⎥ ≤ K ⎣ 2 ⎦ n 2 − 2 x ( µ1 − µ0 ) + µ0 − µ12 ≤ ln K 2 2 2 2 x ( µ1 − µ0 ) + µ0 − µ12 ≥ − ln K n 2 2 2 x ( µ1 − µ0 ) ≥ µ12 − µ0 − ln K n

{
{

}

}

{

(

)}

{

(

)}

( ( (µ x (µ − µ ) ≥
1 0

2 1

2 − µ0

2

) ) ) − 1 ln K
n

(i )

x≥ ⇒

µ1 + µ0
2

+

1 ln K n ( µ0 − µ1 )

x ≥ λ1

( say )

( ii )

We know that

P ( x ≥ λ1 H 0 ) = α

λ1

∫ f ( x ) dx = α
n 2π n 2π

under H 0 : µ = µ0 =α dz = α ⎛ σ2 since x ~ N ⎜ µ , ⎜ n ⎝ ⎞ ⎟ ⎟ ⎠

⇒ ⇒

λ1

e

n ( x − µ0 ) 2 2 dx

∫ 1− µ
1 ∞

e
0

−z

2

2

1 n

n

1 2π

e

−z

2

2 dz

We have,

zα = ⇒ ⇒

λ1 − µ0
1 n 1 n 1 n zα zα

λ1 − µ0 = λ1 = µ0 +

Hence from equation we have, the BCR is

x ≥ µ0 +

1 n

Hypothesis-I ~ 3 of 11

Again, if µ1 < µ0 , then frim the equation ( i ) we have, the BCR is:

x ( µ0 − µ1 ) ≥ ⇒ ⇒ ⇒
Again, we know that,

2 µ12 − µ0

2 −

1 ln k n

x≥− x≤

µ0 + µ1
2 2 +

1 ln k n ( µ0 − µ1 )

µ0 + µ1

1 ln k n ( µ0 − µ1 )

x ≤ λ2 ( say )

( iii )

P ( x < λ2 H 0 ) = α
λ2

⇒ ⇒

−∞

∫ f ( x ) dx = α
n 2π
λ2
−∞ λ2 − µ0 1 n −∞ ∞

under H 0 : µ = µ0 =α

e

n ( x − µ0 )2 2 dx

n 2π 1 2π

∫ ∫

e

z2 2

dz = α

x − µ0 =z 1 n

⇒ dx =

1 n

dz

e
n

z2 2

dz = 1 − α

λ2 − µ0
1

1 2π

e

z2 2

dz = 1 − α

z1−α

∴ ⇒

z1−α =

λ2 − µ0
1

n 1 λ2 = µ0 + z1−α n

By symmetry of normal distribution, we have,

z1−α = − zα ∴

λ2 = µ0 −

1 n

From the equation ( iii ) we have the BCR is

x ≤ µ0 −

1 n

So that we have to w , the critical region as

w : x ≤ xα 2 , x ≥ xα1

where, xα1 = µ0 +

1 n

zα1

and

xα 2 = µ0 −

1 n

zα 2

where z is N ( 0,1) and α1 % to the right and α 2 % to the left side value.

P ⎡ z ≤ zα1 ⎤ = α1 and ⎣ ⎦ P ⎡ z ≥ zα 2 ⎤ = α 2 ⎣ ⎦ where, α1 + α 2 = α
Hypothesis-I ~ 4 of 11

For µ1 > µ0 , the power function is

f ( x ) dx =

∞ xα 2 − µ1 =m 1 n

f ( z ) dz

= 1− F (m)

xα 2

= F ( −m ) ⎛ ⎜ µ1 − xα 2 = F⎜ ⎜ 1 n ⎝ ⎞ ⎟ ⎟ ⎟ ⎠ ⎡ 1 ⎤ ⎢∴ xα 2 = µ0 − zα 2 ⎥ n⎦ ⎣

= F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ ⎣ ⎦
For µ1 < µ0 , the power function is
xα1 − µ2 1 n −∞

xα1

−∞

∫ f ( x ) dx = ∫

f ( z ) dz

= F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤ ⎣ ⎦
So power,

⇒ ⇒

P = F ⎡ n ( µ1 − µ0 ) + zα 2 ⎤ + F ⎡ − n ( µ1 − µ0 ) + zα1 ⎤ ⎣ ⎦ ⎣ ⎦ P = F ⎡ n ∆ + zα 2 ⎤ + F ⎡ − n ∆ + zα1 ⎤ ⎣ ⎦ ⎣ ⎦
n ∆+ zα 2

[ ∆ = µ1 − µ0 ]

P=

−∞

1 2π

e

z2 2

(

n ∆− zα1 −∞

dz +

)

1 2π

e

z2 2

dz ⎛ 1 ⎞ ⎜− ⎟=0 n⎠ ⎝

P=

1 2π

e

(

n ∆+ zα 2 2

)

2

1 n

+

1 2π

e

(

n ∆− zα1 2

)

2

⎧ ( n ⎪ − ⎨e 2π ⎪ ⎩ e
− 1 2

n ∆+ zα 2 2

)

2

−e

(

n ∆− zα1 2

)

2

⎫ ⎪ ⎬=0 ⎪ ⎭

⇒ ⇒ ⇒ ∴

(

n ∆+ zα 2

)

2

=e

(

n ∆− zα1 2

)

2

n ∆ + zα 2 = n ∆ − zα1 − zα 2 = z α1

α1 = α 2

Thus we see the power curve is minimum at µ1 = µ0 if and only if α1 = α 2 . Otherwise the minimum occurs at some

µ1 ≠ µ0 implying that the probability of rejecting H 0 is actually smaller when H 0 is false them when it is true,
Evidently two curves

(b )

and

(c)

representing one sided UMP tests are biased. Power curve

(a)

represents a

most powerful test among all unbiased tests, but not a most powerful among all tests.

Locally Uniformly Most Powerful Unbiased (LUMPU) Test
An unbiased test which is most powerful in the neighborhood of θ 0 is called locally uniformly most powerful unbiased test. This test is also called uniformly most powerful unbiased test of type A . The critical region associated with this test is called unbiased critical region of type A . Hypothesis-I ~ 5 of 11

The region w is said to be a type A critical region of size α for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 , if

i) ii )

P ( x ∈ w | H0 ) = α P ( x ∈ w | H1 ) ≥ α

iii ) P ( x ∈ w | H1 ) ≥ P x ∈ w* | H1 iv) v)

(

) ( )

δ ⎡ P ( x ∈ w | H1 ) ⎤ ⎦θ1 =θ0 = 0 δθ1 ⎣ δ2 δ2 ⎡ P ( x ∈ w | H1 ) ⎤ ≥ 2 ⎡ P x ∈ w* | H1 ⎤ ⎦θ1 =θ0 δθ ⎣ ⎦θ1 =θ0 δθ12 ⎣ 1

where wt is any other region satisfying conditions (1) to ( iv ) . We must choose a critical region for which the power is largest in the neighborhood of H 0 : θ = θ 0 . This condition is made by

( v ) , conditions ( i ) , ( ii ) and ( iii ) (v)

controls the first type of error and unbiasedness and condition

( iv )

makes the region locally unbiased. This test is recommended only when H 0 and H1 are close to each other. Also condition states that the rate of increase of the curve related to w is very large than that of w* in the

neighborhood of θ 0 .

Construction of Type A regions
Let us consider the problem of constructing a UMP unbiased region for H 0 : θ = θ 0 against H1 : θ ≠ θ 0 when no UMP region exists. This statement states the following theorem:

Theorem
If w be an MP region for testing H 0 : θ = θ 0 against H1 : θ = θ1 , then it is necessarily unbiased. Similarly, if w be UMP for testing H 0 : θ = θ 0 against H1 : θ ∈ Ω it is necessarily unbiased.

Proof
If w be an MP region of size α for testing H 0 against H1 then for a non-negative constant k ,

∫ L0 ( x ) dx =
w

{

x| L1 ( x ) > kLo( x )

}

L0 ( x ) dx = α

where L0 ( x ) be the likelihood function under H 0 , and

∫ L1 ( x ) dx =
w

{x|L ( x )> kL ( ) }
1 o x

L1 ( x ) dx = α

So that,

∫ L1 ( x ) dx =
w

{

x L1 ( x ) > kLo( x )

}

L1 ( x ) dx > k

{

x L1 ( x ) > kLo( x )

}

L0 ( x ) dx = kα L0 ( x ) dx = k (1 − α ) ...

...

...

...

(i )

while,

1 − ∫ L1 ( x ) dx =
w

{x L ( x )≤kL ( )}
1
o x

L1 ( x ) dx ≤ k

{x L ( x )≤kL ( )}
1
o x

...

...

( ii )

If k ≥ 1 , then from ( i ) we have,

∫ L1 ( x ) dx > α
w

Hypothesis-I ~ 6 of 11

If k < 1 , then from ( ii ) we have,

1 − ∫ L1 ( x ) dx < 1 − α
w

which implies,

∫ L1 ( x ) dx > α
w

i.e.

1− β > α

Hence w is unbiased. In case w is a UMP region of size α , then too the above approach will hold good if for θ1 we read θ such that

θ ∈ Ω . So we have,
P ( w ) > α for all θ ∈ Ω θ
So, here also w is unbiased.

Example: Consider the case of random sample from N θ , σ 2 , where θ is unknown ( −∞ < θ < ∞ ) and σ 2 is known.
Find the type A region for testing H 0 : θ = θ 0 against H1 : θ ≠ θ1 .

(

)

Solution
n ⎡ ⎛ 1 ⎞ L ( x) = ⎜ ⎟ exp ⎢ − ⎢ ⎝ σ 2π ⎠ ⎣

∑ ( xi − θ )
2σ 2

2

⎤ ⎥ ⎥ ⎦

Hence
⇒ ∴

n ∑ ( xi − θ ) ln L ( x ) = − ln 2πσ 2 − 2 2σ 2 δ ln L ( x ) 2 = 2 ∑ ( xi − θ )( −1)

(

)

2

δθ 0

σ

φ=

n ( x − θ0 )

δ 2 ln L ( x )
2 δθ 0

σ2

=φ'

=−

n

σ2

= a + bφ

( say )

where a = −

n

σ2

, b=0

As such the type-A region for testing H 0 : θ = θ 0 against H1 : θ = θ1 ≠ θ 0 is w , given by

w = { x φ < c1 ∪ φ > c2 }
= { x x < d1 ∪ x > d 2 }

( say )

where c1 and c2 or d1 and d 2 are constants such that

∫ Lθ ( x ) dx = α
0

and

∫ φ Lθ ( x ) dx = 0
0

w

w

Now, these conditions are equivalent to the conditions
d1 −∞

gθ0 ( x ) dx +

gθ0 ( x ) dx = α

d1

and

d2

−∞

∫ φ gθ ( x ) dx + ∫ φ gθ ( x ) dx = 0
0 0

d2

where gθ be the marginal p d.f. of x or equivalent to Hypothesis-I ~ 7 of 11

d2

∫ gθ ( x ) dx = 1 − α
0

( *)
⎡ ⎢since ⎢ ⎣

d1 d2

and

∫ φ gθ0 ( x ) dx = 0

d1

−∞

∫ φ gθ ( x ) dx = 0⎥ ⎥
0

⎤ ⎦

(**)

Now from (*) we can write,
n ( d 2 −θ0 )

1 2π
and from (**) we have,
n ( d 2 −θ 0 )

σ

n ( d1 −θ0 )

e

y2 2

dy = 1 − α

σ

σ

n ( d1 −θ0 )

e

y2 2

dy = 0

⎡ n ( x − θ0 ) ny⎤ = ⎢since φ = ⎥ 2 σ ⎥ σ ⎢ ⎣ ⎦

σ

⎡ − y2 ⎢ −e 2 ⎢ ⎣

⎤ ⎥ ⎥ ⎦

n ( d 2 −θ0 )

σ
n ( d1 −θ 0 )

=0

(***)
2

σ
2

⇒ ⇒ ⇒

−e −e −

1 ⎧ n ( d 2 −θ0 ) ⎫ ⎪ ⎪ − ⎨ ⎬ 2⎪ σ ⎪ ⎩ ⎭ 1 ⎧ n ( d 2 −θ0 ) ⎫ ⎪ ⎪ − ⎨ ⎬ 2⎪ σ ⎪ ⎩ ⎭

+e
2

1 ⎧ n ( d1 −θ0 ) ⎫ ⎪ ⎪ − ⎨ ⎬ 2⎪ σ ⎪ ⎩ ⎭

=0
2

= −e
2

1 ⎧ n ( d1 −θ0 ) ⎫ ⎪ ⎪ − ⎨ ⎬ 2⎪ σ ⎪ ⎩ ⎭

1 ⎧ n ( d 2 − θ0 ) ⎫ 1 ⎧ n ( d1 − θ 0 ) ⎫ ⎪ ⎪ ⎪ ⎪ ⎬ =− ⎨ ⎬ ⎨ 2⎪ 2⎪ σ σ ⎪ ⎪ ⎭ ⎩ ⎭ ⎩

2

Solving (***) we have,

n ( d1 − θ0 )

σ
and since 1 2π

=−
σ

n ( d2 − θ0 )

σ
e
− y2 2

n ( d 2 −θ0 )

n ( d 2 −θ 0 )

dy = 1 − α

σ

We have,

n ( d2 − θ0 )

σ

2

= τα

2

d 2 = θ0 + τ α . − n ( d1 − θ0 )

σ
n = τα

Hence also,

σ
2

2

d1 = θ0 − τ α .

σ
n

As such, the type-A region of size α is

⎧ σ σ ⎫ ∪ x > θ0 + τ α . w = ⎨ x | x < θ0 − τ α . ⎬ 2 2 n n⎭ ⎩

⎧ ⎫ n x − θ0 ⎪ ⎪ = ⎨x | > τα ⎬ σ 2⎪ ⎪ ⎩ ⎭
Hypothesis-I ~ 8 of 11

Similar Region (Testing Composite Hypothesis)
Let X be a random variable distributed as f ( x ; θ1 ,

, θ k ) . A hypothesis of the form

H 0 : θ1 = θ10 ,
Here

, θr = θr 0

(r < k ) (k − r )

(1)
d . f . We want to determine a

(k − r )

unspecified parameters and it is a composite hypothesis with

critical region ω of size α such that

ω

∫ L ( x | H 0 ) dx = α ∫ L ( x | H1 ) dx
is maximum

( 2) ( 3)

and

ω

where H1 is some simple hypothesis about the parameters. Since the parameters θ r +1 ,

, θ k are unspecified by H 0 , α given in

( 2)

is in general a function of these

parameters and hence con not be uniquely determined. If α does not depend on the unspecified parameters the region ω for which equation ( 2 ) is true is called a region similar to the sample space with respect to the parameters

θ r +1 ,

, θ k or we can say that the region is similar region.

A test based on a similar region of size α is called a similar size α test.

Concept of Similar Region
In case of a composite hypothesis the selection of a suitable tests involves three important stages i) finding all similar region ii) finding these similar region S which are of size of α iii) finding a similar region of size α that is best from the point of view of power, then we will get UMP critical region.

Construction of Similar Region
When statistic sufficient for each of the unspecified parameters exists or when a jointly sufficient statistic exists for the unknown parameters, then regions similar to the sample space can be constructed. Let ω be any critical region of size α . Now we define a indicator function or variable Iω of the critical region ω

⎧1 Iω = ⎨ ⎩0

; ;

if observation lies in the C.R. i.e. x ∈ ω if observation lies outside the C.R. i.e. x ∈ ω

The set of all points X for which Iω = 1 is the region of rejection.

ω

∫ L ( x | H ) = ∫ Iω L ( x | H1 ) dx = E ( Iω | H )
S

= Expected value of Iω when it is true and ⎧α E ( Iω | H ) = ⎨ ⎩1 − β ; ; if H = H 0 if H = H1

If the parameter θ admits a sufficient estimator the likelihood function factorizes into

L ( x | θ ) = g ( t , θ ) h ( x, t )
where g ( t , θ ) is the frequency function of the sufficient statistic t , and h ( x, t ) is the functions of sample values only for a given t . Hypothesis-I ~ 9 of 11

Now,

E ( Iω ) = ∫ Iω L ( x | θ ) dx
S

= ∫ Iω g ( t , θ ) h ( x, t ) dx
S

= E ⎡ E ( Iω | t ) ⎤ ⎣ ⎦

( 4)

The equation ( 4 ) is very important for us, since t is sufficient for θ . E ( Iω | t ) does not depend on θ and this has the same expectation as Iω i.e.

E ( Iω | t ) = E ( Iω )
If t is sufficient for θ , both H 0 and H1 are true, equation ( 4 ) implies that there is a region based on t similar to the sample with size and power exactly equal to the original critical region ω .

Neyman Structure
A test with critical region ω is said to be of Neyman structure with respect to t if E ( Iω | t ) is the same almost everywhere for θ i.e. a test satisfying E ( Iω | t ) = α is said to have Neyman structure with respect to t .

Example: Let x1 ,

, xn be a random sample drawn from N µ , σ 2

(

)

where both µ and σ 2 are unknown. Test

H 0 : µ = µ0 against H1 : µ = µ1 .

Solution
The hypothesis H 0 has one d . f . , the parameter σ 2 being unspecified. We have

⎛ 1 ⎞ ⎡ 1 L ( x | H0 ) = ⎜ ⎟ exp ⎢ − 2 ⎣ 2σ ⎝ 2πσ ⎠
Under H 0 the statistic V = simple H 0 and H1 as
2 H 0 : µ = µ0 , σ 2 = σ 0

n

∑ ( xi − µ0 )

2⎤

⎥ ⎦

∑ ( xi − µ0 )
i =1

n

2

is sufficient for σ 2 and also this is complete sufficient statistic. Consider a

H1 : µ = µ1 , σ 2 = σ12
According to Neyman-Pearson lemma, we have,

L ( x | H0 ) L ( x | H1 )

⎡ 1 = exp ⎢ − 2 ⎢ 2σ 0 ⎣

∑ ( xi − µ0 )

2

+

1 2σ12

∑ ( xi − µ1 )

2⎤

⎥ ≤ Constant ⎥ ⎦

With this we can find out the MP critical region of size α for testing simple H 0 against simple H1 is
2 L x | µ1 , σ12 > k ( v ) L x | µ0 , σ 0

(

)

(

)
(1)

where k ( v ) is such that the conditional size of ω0 given V = v is α which implies that

( µ1 − µ0 )( x − µ0 ) > k1 ( v )
where k1 ( v ) is related to k ( v ) .

Case I:
If µ1 > µ0 , there condition (1) is equivalent to

( x − µ0 ) > k2 ( v ) n ( x − µ0 )
v

> k3 ( v )

( say )
Hypothesis-I ~ 10 of 11

So as such, we can write,

ω0 = ⎨ x |
⎪ ⎩

⎧ ⎪

n ( x − µ0 ) v

⎫ ⎪ > k3 ( v ) ⎬ ⎪ ⎭

where k3 ( v ) is to be determined such that Pµ0 [ω0 | v ] = α

Here,

n ( X − µ0 ) v

and v are independent. So that the conditional distribution

n ( X − µ0 ) v

given V = v is the

same as the distribution of

n ( x − µ0 ) V

.

So k3 ( v ) will be independent of V . Hence we can write We know that,

⎡ n ( X − µ0 ) ⎤ > k3 ⎥ = α P⎢ V ⎢ ⎥ ⎣ ⎦

n ( X − µ0 ) v

=

n ( X − µ0 ) n ( X − µ0 ) +

=

∑ ( xi − X ) n ( X − µ0 )
2

2

∑ ( xi − X )
Since

⎛ t2 ⎞ ⎜1 + ⎜ n −1 ⎟ ⎟ ⎝ ⎠

=

t t2 + n −1

~ tn −1

where,

t=

n ( X − µ0 ) S

n ( X − µ0 ) v

< k3 iff t > k4 ⎧ ⎪

( say ) , we may also write,
⎫ ⎪ > k3 ( v ) ⎬ ⎪ ⎭ = { x | t > k4 } ⎡Where P [t > k4 ] = α ⎤ θ0 ⎣ ⎦

ω0 = ⎨ x |
⎪ ⎩

n ( x − µ0 ) v

where k4 is the upper α point of A distribution of tn −1 and we can write finally

ω0 = ⎨ x |
⎪ ⎩

⎧ ⎪

n ( x − µ0 ) S
2 2

⎫ ⎪ > tα , n −1 ⎬ ⎪ ⎭

Since this is independent of σ 0 and σ 1 , it is the MP similar region of size α for testing H 0 against H1 .

Case II:
If µ1 < µ0 , in this case we have as before

( µ1 − µ0 )( x − µ0 ) > k1 ( v ) ′ ( x − µ0 ) < k 2 ( v )
⎧ ⎪ n ( x − µ0 ) S ⎫ ⎪ < tα , n −1 ( v ) ⎬ ⎪ ⎭

So preceding as before, MP similar region of size α for testing H 0 against H1 is

′ ω0 = ⎨ x | ⎪ ⎩

Since ω0 is independent of µ1 i.e. it is the same for all µ1 > µ0 in fact it is the UMP similar region of size α for testing H 0 against the more composite H1 : µ > µ1 .

′ Similarly, ω0 is the UMP similar region of size α for testing H 0 : µ = µ1 against H1 : µ < µ1 .
Hypothesis-I ~ 11 of 11

Likelihood Ratio Test

Introduction
Neyman and Pearson (1928) developed a simpler method of testing hypothesis called the method of Likelihood Ratio. Just like a method of maximum likelihood which yields an estimate of a parameter, the method of maximum likelihood ratio test yields a statistic rather more easily.

Definition
Let θ ∈ Ω be a vector of parameters and let X = ( x1 ,

, xn ) be a random vector with p.d . f .

fθ , θ ∈ Ω .

Consider the problem of testing the null hypothesis H 0 : X ~ fθ , θ ∈ Ω0 against the alternative hypothesis

H1 : X ~ fθ , θ ∈ Ω1 = Ω − Ω0 . The likelihood ratio test for testing H 0 against H1 is defined as the ratio

λ = λ ( X ) = λ ( x1 ,

, xn ) =

θ ∈Ω0 θ ∈Ω

sup fθ ( x1 ,

, xn ) , xn ) =

ˆ L Ω0

sup fθ ( x1 ,

( ) ˆ L (Ω)

And the test is of the form: reject H 0 iff λ ( X ) < C , where C is some constant, determined from the size α (the level of significance, 0 < α < 1 i.e., sup P x : λ { x} < C = α ). θ
θ ∈Ω0

(

)

Remarks
The numerator of the likelihood ratio λ is the best explanation of X that the H 0 can provide and denominator is the best possible explanation of X . H 0 is rejected if there is a much better explanation of X then the best one provided by H 0 . It is clear that 0 ≤ λ ≤ 1 .

Properties of LRT
LRT has some desirable properties, specially large sample properties. LRT is generally UMP if an UMP test exists. We state below, the two asymptotic properties of LRT. i) ii) Under certain conditions, −2 ln λ has an asymptotic chi-square distribution. Under certain assumptions, LRT is consistent.

Properties of LRT Statistic ( λ )
i) ii) The likelihood ratio λ is a function of x only and hence λ is a statistic which does not depend on θ . Since λ is the ratio of conditional maximum of likelihood function to its unconditional maximum, thus

0 ≤ λ ≤ 1.
iii) iv) v) The critical region is 0 < λ < λ0 when

∫0 h ( λ ) d λ = α , the level of significance.

λ0

λ is always a function of sufficient statistic.
If the null hypothesis H 0 is composite, the distribution of λ may not be always unique.

2 vi) Under certain condition −2 ln λ follows χ(1) .

Likelihood Ratio Test ~ 1 of 9

LRT for Testing the Equality of Means of Two Normal Populations
Let us consider, two independent random variables, X 1 and X 2 follows normal distribution with N µ1 , σ 1
2 N µ2 , σ 2 respectively. We want to test the hypothesis

(

2

)

and

(

)

H 0 : µ1 = µ2 = µ ( say ) against , H1 : µ1 ≠ µ 2 ;

;

2 0 < σ12 < ∞ , 0 < σ 2 < ∞

2 σ12 > 0 , σ 2 > 0

Case I: Population variance are unequal.
Ω= and
Let x1i

Ω0

{( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2} = {( µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}
1 2 2 1 2 2 i 2 i 2 1 2 2 2 i

( i = 1,

, m ) and x2 j
2

( j = 1,
2 2 2

, n ) be two independent random samples of sizes m and n from the

population N µ1 , σ 1

(

) and N ( µ , σ ) respectively. Then the likelihood function is given bym 2

⎛ 1 ⎞ L=⎜ ⎜ 2πσ 2 ⎟ ⎟ 1 ⎠ ⎝

⎡ 1 exp ⎢ − 2 ⎢ ⎣ 2σ 1

∑ ( x1i − µ1 )
i =1

m

2⎤⎛

⎞ ⎥⎜ ⎜ 2πσ 2 ⎟ ⎟ ⎥ 2 ⎠ ⎦⎝ 1

n

2

⎡ 1 exp ⎢ − 2 ⎢ 2σ 2 ⎣

∑ ( x2 j − µ2 )
j =1

n

2⎤

⎥ ⎥ ⎦

(1)

2 2 The maximum likelihood estimates for µ1 , µ 2 , σ 1 , σ 2 are given by

ˆ µ1 =

1 m

∑ x1i = x1
i =1 m

m

and
2 2 = s1

ˆ µ2 =
2 σ2 =

1 n

∑ x2 j = x2
j =1 n

n

σ12 =

1 m

∑ ( x1i − x1 )
i =1
2

( say )
2

and

1 n

∑ ( x2 j − x2 )
j =1

2

2 = s2

( say )

Now

⎛ 1 ⎞ ˆ L Ω =⎜ ⎜ 2π s 2 ⎟ ⎟ 1 ⎠ ⎝

( )

m

⎛ 1 ⎜ ⎜ 2π s 2 2 ⎝

⎞ ⎟ ⎟ ⎠

n

2

e

( m+ n)

Under H 0 , the likelihood function is given by

⎛ 1 L ( Ω0 ) = ⎜ ⎜ 2πσ 2 ⎝ 1

⎞ ⎟ ⎟ ⎠

m

2

⎡ 1 exp ⎢ − 2 ⎢ ⎣ 2σ1

∑ ( x1i − x1 )
i =1
2

m

2⎤⎛

⎞ ⎥⎜ ⎜ 2πσ 2 ⎟ ⎟ ⎥ ⎦⎝ 2 ⎠ 1
2

n

2

⎡ 1 exp ⎢ − 2 ⎢ 2σ 2 ⎣

∑ ( x2 j − x2 )
j =1

n

2⎤

⎥ ⎥ ⎦

To obtain the maximum value of L ( Ω0 ) for variation in µ , σ 1 , σ 2 , it will be seen that estimate of µ is obtained as the rot of a cubic equation.

m 2 ( x1 − µ )
m 2 i =1

ˆ ˆ ∑ ( x1i − µ ) ∑ ( x2 j − µ )
j =1

+

n 2 ( x2 − µ )
n 2

And thus is complicated function of the sample observations. It is impossible to obtain the critical region 0 < λ < λ0 , for given α since the distribution of the population variances is ordinarily unknown. As an approximate test, −2 ln λ can be distributed as χ 2 variate with 1 d . f .

2 2 Case II: Population variances are equal, i.e., σ 1 = σ 2 = σ 2

( say )
2

Ω= and Ω0

{( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; = {( µ , σ ) ; − ∞ < µ < ∞ , σ > 0}
2 1 2

i

i = 1, 2

}
Likelihood Ratio Test ~ 2 of 9

2

2

The likelihood estimate is then given by

ˆ µ1 = x1 and ˆ σ2 = =

and

ˆ µ2 = x2 ˆ ∑ ( x2 j − µ2 )
j =1 n n 2⎤

1 ⎡m 2 ˆ ⎢ ( x1i − µ1 ) + m + n ⎢ i =1 ⎣

⎥ ⎥ ⎦ = 1 ⎡ 2 2 ms1 + ns2 ⎤ ⎦ m+n ⎣

1 ⎡m 2 ⎢ ( x1i − x1 ) + m + n ⎢ i =1 ⎣

∑ ( x2 j − x2 )
j =1 2

2⎤

⎥ ⎥ ⎦

Now

⎧ ⎪ ˆ L Ω =⎨ ⎪ 2π ⎩

( )

⎫ m+n ⎪ ⎬ 2 2 ⎡ ms1 + ns2 ⎤ ⎪ ⎣ ⎦⎭

( m+ n)
2

e

( m+n)

⎡ Substituting the values of µ1 , µ 2 , σ 2 in (1) ⎤ ˆ ˆ ˆ ⎣ ⎦

Under H 0 the likelihood function is

⎛ 1 ⎞ L ( Ω0 ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠ ⇒ ln L ( Ω0 ) = C −

m+n

2

⎡ 1 exp ⎢ − 2 ⎢ 2σ ⎣

⎧m ⎪ 2 ⎨ ( x1i − µ ) + ⎪ ⎩ i =1

∑ ( x2 j − µ )
j =1 n j =1

n

2 ⎫⎤ ⎪

⎬⎥ ⎪⎦ ⎭⎥

1 m+n ln σ 2 − 2 2 2σ

⎧m ⎪ 2 ⎨ ( x1i − µ ) + ⎪ ⎩ i =1

∑ ( x2 j − µ )

2⎫ ⎪

⎬ ⎪ ⎭

where C is a constant independent of µ and σ 2 . The likelihood equation for estimating µ is

∂ ln L =0 ∂µ ⇒ ⇒ ⇒ 1 ⎧m ⎪ ⎨ ( x1i − µ ) + σ 2 ⎪ i =1 ⎩

∑ ( x2 j − µ )⎬ = 0
j =1

n

⎫ ⎪ ⎪ ⎭

( mx1 + nx2 ) − ( m + n ) µ = 0 ( mx1 + nx2 ) µ= (m + n)
∂ ln L ∂σ 2 m+n 2σ
2

Also, ⇒ ⇒ But −

=0 +
n 1 ⎧m ⎪ ( x − µ )2 + ∑ x2 j − µ 4 ⎨∑ 1i 2σ ⎪ i =1 j =1 ⎩

(

)

2⎫ ⎪

⎬=0 ⎪ ⎭

ˆ σ2 =

n ⎧ 1 ⎪m 2 ˆ ˆ ⎨∑ ( x1i − µ ) + ∑ x2 j − µ ( m + n ) ⎪ i =1 j =1 ⎩

(

)

2⎫ ⎪

⎬ ⎪ ⎭

ˆ ˆ ∑ ( x1i − µ ) = ∑ ( x1i − x1 + x1 − µ )
2 i =1 i =1 m 2

m

m

2

ˆ = ∑ ( x1i − x1 ) + m ( x1 − µ )
i =1

2

mx + nx2 ⎞ ⎛ 2 = ms1 + m ⎜ x1 − 1 ⎟ m+n ⎠ ⎝ Similarly
n

2 2 = ms1 +

mn 2 ( x1 − x2 )

2

( m + n )2

∑(
j =1

ˆ x2 j − µ

)

2

2 = ns2 +

nm 2 ( x2 − x1 )

2

( m + n )2

ˆ σ2 =

2 2 2 nm 2 ( x2 − x1 ) ⎫ 1 ⎧ 2 mn ( x1 − x2 ) ⎪ ⎪ 2 + ns2 + ms1 + ⎨ ⎬ 2 2 (m + n) ⎩ (m + n) (m + n) ⎪ ⎪ ⎭ 2⎫ mn ( x1 − x2 ) ⎪ 1 ⎧ 2 ⎪ 2 = ⎨ms + ns2 + ⎬ (m + n) ⎪ 1 (m + n) ⎪ ⎩ ⎭

Likelihood Ratio Test ~ 3 of 9

ˆ L Ω0

( )

⎧ ⎪ ⎪ =⎨ ⎪ 2π ⎪ ⎩

⎫ ⎪ m+n ⎪ ⎬ mn 2⎤⎪ ⎡ 2 2 ms1 + ns2 + ( x1 − x2 ) ⎥ ⎪ ⎢ m+n ⎣ ⎦⎭

( m+n)
2

e

( m+ n)
2

λ=

ˆ L Ω0

( ) ˆ L (Ω)
( m+ n)
2 2 ⎧ mn ( x1 − x2 ) ⎪ = ⎨1 + 2 2 ⎪ ( m + n ) ms1 + ns2 ⎩

⎧ ⎫ 2 2 ⎪ ⎪ ms1 + ns2 =⎨ ⎬ 2 2 ⎪ ms1 + ns2 + mn ( x1 − x2 )2 ⎪ m+n ⎩ ⎭
We know that, under H 0 : µ1 = µ2 , the test statistic

(

)

⎫ ⎪ ⎬ ⎪ ⎭

( m+ n)
2

t= S

x1 − x2 1 1 + m n

where

1 2 2 ms1 + ns2 m+n−2

(

)

follows student t distribution with m + n − 2 d . f .

Thus for testing the null hypothesis
2 H 0 : µ1 = µ2 = µ ; σ 12 = σ 2 = σ 2 > 0

against ,

H1 : µ1 ≠ µ2

;

2 σ12 = σ 2 = σ 2 > 0

We have the two-tailed t -test defined as followsIf t =

x1 − x2 S 1 +1 m n

> tm + n − 2 α

( 2 ) reject H

0

, otherwise H 0 may be accepted.

Likelihood Ratio Test for Testing the Equality of Variances of Two Population
Consider two normal populations N µ1 , σ 1
2

(

2

)

2 and N µ2 , σ 2 , where the means µ1 and µ2 and variances σ 1 2

(

)

and σ 2 are unspecified. We want to test the hypothesis
2 H 0 : σ12 = σ 2 = σ 2

against
If x1i

H1 : σ12

2 ≠ σ2

;

(Unspecified ) , with µ1 µ1 and µ2 (Unspecified )

and µ2 (Unspecified )

( i = 1,

, m ) and x2 j
2

( j = 1,

, n ) be independent random samples of size m and n form N µ1 , σ12

(

)

and N µ 2 , σ 2

(

) respectively then
m 2

⎛ 1 ⎞ L=⎜ ⎜ 2πσ 2 ⎟ ⎟ ⎝ 1 ⎠

⎡ 1 exp ⎢ − 2 ⎢ ⎣ 2σ 1

1 ⎞ 2⎤⎛ ∑ ( x1i − µ1 ) ⎥ ⎜ 2πσ 2 ⎟ ⎜ ⎟ ⎥ i =1 ⎦⎝ 2 ⎠
m

n

2

⎡ 1 exp ⎢ − 2 ⎢ 2σ 2 ⎣

∑ ( x2 j − µ2 )
j =1

n

2⎤

⎥ ⎥ ⎦

(1)

In this case,

Ω= and Ω0

{( µ , µ , σ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2} = {( µ , µ , σ ) ; − ∞ < µ < ∞ , σ > 0 ; i = 1, 2}
1 2 2 1 2 2 i 2 i 2 2 1 2 i

Likelihood Ratio Test ~ 4 of 9

2 2 The maximum likelihood estimates for µ1 , µ2 , σ 1 and σ 2 is given by

ˆ µ1 =

1 m

i =1 m i =1

m

x1i = x1
2 2 = s1

and

ˆ µ2 =
2 σ2 =

1 n

∑ x2 j = x2
j =1 n

n

σ12 =

1 m

∑ ( x1i − x1 )
m

( say )

and

1 n

∑ ( x2 j − x2 )
j =1

2

2 = s2

( say )

Substituting these values in (1) we get,

⎛ 1 ⎞ ˆ L Ω =⎜ ⎜ 2π s 2 ⎟ ⎟ 1 ⎠ ⎝

( )

2

⎛ 1 ⎜ ⎜ 2π s 2 2 ⎝

⎞ ⎟ ⎟ ⎠

n

2

e

( m+ n)
2

Under H 0 , the likelihood function is given by

⎛ 1 ⎞ L ( Ω0 ) = ⎜ ⎟ ⎝ 2πσ 2 ⎠

m+ n

2

⎡ 1 exp ⎢ − 2 ⎢ 2σ ⎣

⎧m ⎪ 2 ⎨ ( x1i − µ1 ) + ⎪ i =1 ⎩

∑ ( x2 j − µ2 )
j =1

n

2 ⎫⎤ ⎪

⎬⎥ ⎪⎥ ⎭⎦

( 2)

The MLE’s for µ1 , µ 2 and σ

2

are now given by

ˆ µ1 =

1 m

i =1

m

x1i = x1

and

ˆ µ2 =

1 n

∑ x2 j = x2
j =1

n

and

ˆ σ2 = =

1 ⎡m 2 ˆ ⎢ ( x1i − µ1 ) + m + n ⎢ i =1 ⎣

ˆ ∑ ( x2 j − µ2 )
j =1 n

n

2⎤

⎥ ⎥ ⎦ = 1 ⎡ 2 2 ms1 + ns2 ⎤ ⎦ m+n ⎣

1 ⎡m 2 ⎢ ( x1i − x1 ) + m + n ⎢ i =1 ⎣

∑ ( x2 j − x2 )
j =1

2⎤

⎥ ⎥ ⎦

Substituting these values in ( 2 ) we get,

ˆ L Ω0

( )

⎧ ⎫ m+n ⎪ ⎪ =⎨ 2 2⎤⎬ ⎪ 2π ⎡ ms1 + ns2 ⎦ ⎪ ⎩ ⎣ ⎭
ˆ L Ω0

( m+ n)
2

e

( m+n)
2

⎡ Substituting the values of µ1 , µ2 , σ 2 in (1) ⎤ ˆ ˆ ˆ ⎣ ⎦

λ=

( ) ˆ L (Ω)

= (m + n)

m+n 2

=

(m + n)
m
m

m+n 2 n

2n 2

m n ⎫ ⎧ 2 2 ⎪ s1 2 s2 2 ⎪ ⎪ ⎪ ⎨ ⎬ m+n ⎪ 2 2 2 ⎪ ⎪ ms1 + ns2 ⎪ ⎩ ⎭ m n ⎫ ⎧ 2 2 ⎪ ms1 2 ns2 2 ⎪ ⎪ ⎪ ⎨ m+n ⎬ ⎪ 2 2 2 ⎪ ⎪ ms1 + ns2 ⎪ ⎩ ⎭

( ) ( ) ( )

( ) ( ) ( )

( 3)

We know, that under H 0 the statistic

∑ ( x1i − x1 )
F=

2

( m − 1)
2

∑ ( x2 j − x2 )

=

2 s1 2 s2

( n − 1)

follows F -distribution with ( m − 1) , ( n − 1) d . f . and also implies

F= ⇒

2 m ( n − 1) s1 2 n ( m − 1) s2

( m − 1) ms 2 F= 1 2 ( n − 1) ns2
Likelihood Ratio Test ~ 5 of 9

Substituting in ( 3) and simplifying, we get
m+ n 2 m ⎧ ⎪ ⎛ m −1 F ⎞ 2 ⎪ ⎜ n −1 ⎟ ⎝ ⎠ ⎨ m+n ⎪⎛ m −1 ⎞ 2 F⎟ ⎪ ⎜1 + n −1 ⎠ ⎪⎝ ⎩

λ=

(m + n)
m
m

n 2n 2

⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎭

Thus λ is a monotonic function of F and hence the test can be carried on with F as test statistics. The critical region 0 < λ < λ0 can be given by pair of intervals F ≤ F1 and F ≥ F2 , where F1 and F2 are determined so that under H 0

P ( F ≥ F2 ) = α

2

and

P ( F ≥ F1 ) = 1 − α

2

Since, under H 0 , F follows F -distribution with m − 1 , n − 1 d . f . we have

F2 = Fm −1, n −1 α

( 2)

and

F1 = Fm −1, n −1 1 − α

(

2

)

where, Fm, n (α ) is upper α point of F -distribution with ( m, n ) d . f .

2 2 2 2 Consequently for testing H 0 : σ 1 = σ 2 against alternative hypothesis H1 : σ 1 ≠ σ 2 , we have a two-tailed F -test,

the critical region being given by

F > Fm −1, n −1 α

( 2)

and

F < Fm −1, n −1 1 − α

(

2

)

Example: Let x1 ,

, xn be a random sample from f ( x ; θ ) = θ e−θ x I ( 0, ∞ ) ( x ) where Θ = {θ , θ > 0} . Test H 0 : θ ≤ θ0

against H1 : θ > θ 0 .

Solution
θ ∈Θ

sup L ⎡(θ ; x1 , ⎣

−θ x , xn ) ⎤ = sup ⎡θ n e ∑ i ⎤ ⎦ ⎢ ⎥ ⎣ ⎦ θ >0

⎛ n ⎞ −n =⎜ ⎟ e ⎜ xi ⎟ ⎝ ⎠

n

⎡ ⎢ ˆ ⎢ Since, θ = ⎢ ⎣

1 xi n

= n

⎤ ⎥ xi ⎥ ⎥ ⎦

and

θ ∈Θ0

sup L ⎡(θ ; x1 , ⎣

−θ x , xn ) ⎤ = sup ⎡θ n e ∑ i ⎤ ⎦ ⎢ ⎥ ⎣ ⎦ 0<θ ≤θ 0 n ⎧⎛ ⎞ ⎪⎜ n ⎟ e − n ⎪⎜ xi ⎟ ⎠ = ⎨⎝ ⎪ n −θ0 ∑ xi ⎪θ 0 e ⎪ ⎩

if if

∑ xi ∑ xi
n

n

≤ θ0 > θ0

Hence ⎧1 ⎪ ⎪ n −θ0 ∑ xi ⎪ λ = ⎨ θ0 e n ⎪⎛ ⎞ −n ⎪⎜ n ⎟ e xi ⎠ ⎪⎝ ⎩ if n

∑ xi ∑ xi

≤ θ0 > θ0

if

n

Likelihood Ratio Test ~ 6 of 9

If 0 < λ0 < 1 , then a generalized likelihood ratio test is given byReject H 0 if λ ≤ λ0 Reject H 0 if n

Or,

⎛ θ 0 xi ⎞ ⎡ ≤ θ 0 and ⎜ ⎟ exp ⎣ −θ 0 ⎜ n ⎟ xi ⎝ ⎠
n

n

⎤ ∑ xi + n ⎦ ≤ λ0

Or,

Reject H 0 if θ 0 x < 1 and (θ 0 x ) exp ⎡ − n (θ 0 x − 1) ⎤ ≤ λ0 ⎣ ⎦

Let,

y = θ0 x

and can say that

y n exp ⎡ −n ( y − 1) ⎤ ⎣ ⎦

has a maximum for

y = 1 . Hence,

y < 1 and

y n exp ⎡ −n ( y − 1) ⎤ ≤ λ0 iff y ≤ k , where k is a constant satisfying 0 < k < 1 . ⎣ ⎦
A generalized likelihood ratio test reduces to the following. Reject H 0 iff θ 0 x < k , where 0 < k < 1 .

That is, reject H 0 if x is less than some function of

1

θ0

. Figure:

If that generalized likelihood ratio test having size α is desired, k is obtained as the solution to the equation-

α = P 0 (θ0 X < k ) = P 0 (θ 0 ∑ X i < nk ) = ∫ θ θ
Note that P θ 0 X < k ≤ P 0 θ 0 X < k θ θ

nk

1 n

0

u n −1e−u du

(

)

(

)

for θ ≤ θ 0 .

Uses of LRT

λ can be used for determination of the rejection as λ is positive monotonic function. It is used
1) Test for the mean of a normal population

H 0 : µ = µ0 ⎫ ⎬ H1 : µ = µ1 ⎭
2)

x1 ,

, xn ~ N µ , σ 2

(

)

For σ 2 known and σ 2 is unknown.

Test for the equality of means of two normal populations

H 0 : µ1 = µ2 H1 : µ1 ≠ µ2
3) For σ 2 is equal and σ 2 is unequal. Test for the equality of means of several normal populations

H 0 : µ1 = µ2 =
4)

= µk

H1 : µi ' s are not equal
Test for the variance of normal population
2 H0 : σ 2 = σ 0 ⎫ ⎪ ⎬ 2 2 H1 : σ ≠ σ 0 ⎪ ⎭

σ 2 is specified

5)

Test for the equality of variances of two normal populations
2 H 0 : σ12 = σ 2 2 H1 : σ12 ≠ σ 2

Likelihood Ratio Test ~ 7 of 9

Situation of Using LRT
Neyman-Pearson lemma based on the ratio of two probability distribution function provides a best test for testing simple hypothesis. Generally best test depends on the form of probability distribution and alternative hypothesis. In this situation, a general test construction is recommended known as LRT for simple or composite hypothesis.

Consistent Test and LRT
A test of a hypothesis H 0 against a class of alternatives H1 is said to be consistent if, when any number of H1 holds, the probability of rejecting H 0 tends to 1 as sample size tends to infinity. If c* is the CR and X the sample point, we may write,
n →∞

lt P ⎡ X ∈ c* | H1 ⎤ = 1 , n be the sample size. ⎣ ⎦

ˆ The LRT is a consistent test. We have that under a very generally satisfied condition, the MLE θ of a parameter
vector θ is consistent. If we are dealing with a situation in which all the MLE’s are consistent, we see from the definition of the LRT statistic that, as sample size increases,

λ→

L x | θ r0 , θ s

L ( x | θr , θs )

(

)

(1)

where, θ r , θ s are the true values of the parameters and θ r0 is the hypothetical values of θ r being tested. Thus, when H 0 holds

λ → 1, in probability
and the critical region

λ ≤ cα
will therefore have its boundary cα approaching 1 . When H 0 does not hold, the limiting values of λ is (1) will some constant k satisfying

0 ≤ k <1
and thus we have

P [ λ ≤ cα ] → 1
Therefore, LRT is consistent.

Show that under certain regularity condition −2 ln λ ~ χ 2 with 1 d . f . where λ is LR.
Let x1 , .

, xn be a sample from N µ , σ 2 . Let the hypothesis be

(

)

H 0 : µ = µ0 H1 : µ ≠ µ0
n

where, σ 2 is known

Then,

2⎤ ⎛ 1 ⎞ exp ⎡ − 1 ⎜ ⎟ ⎢ 2σ 2 ∑ ( xi − µ0 ) ⎥ ⎝ σ 2π ⎠ ⎣ ⎦ λ= n 2⎤ ⎛ 1 ⎞ exp ⎡ − 1 ⎜ ⎟ ⎢ 2σ 2 ∑ ( xi − x ) ⎥ ⎝ σ 2π ⎠ ⎣ ⎦ 2 2 ⎤ ⎡ 1 = exp ⎢ − 2 ∑ ( xi − µ0 ) − ∑ ( xi − x ) ⎥ ⎣ 2σ ⎦ ⎡ 1 2 ⎤ = exp ⎢ − 2 nx 2 − 2nµ0 x + nµ0 ⎥ ⎣ 2σ ⎦ 2⎤ ⎡ n = exp ⎢ − 2 { x − µ0 } ⎥ ⎣ 2σ ⎦

[ where,

ˆ the MLE of µ = x ]

{
{

}

}

Likelihood Ratio Test ~ 8 of 9

⇒ ⇒

ln λ = −

n 2σ 2 n

{ x − µ 0 }2
=

− 2 ln λ =

σ

{ x − µ 0 }2 2

{ x − µ 0 }2
σ2
n

If n is large then

{ x − µ0 }2
σ2
n

~ χ (2 ) 1

Thus, −2 ln λ ~ χ (2 ) where λ is LR. 1

BCR or LRT is a function of sufficient statistic
Let we have,

H 0 : θ = θ0 H1 : θ = θ1
By LRT and Naymen-Pearson lemma, BCR is given by

L (θ 0 ) L (θ1 )

≤k

(1)

where, k is a positive number

Let, t be sufficient statistic for θ , then by factorization criteria, we get

L (θ 0 ) = h ( t | θ 0 ) k ( x )

and

L (θ1 ) = h ( t | θ1 ) k ( x )

where k ( x ) is the function of x and independent of θ , thus from (1) we get,

L (θ 0 ) L (θ1 )

=

h ( t | θ0 ) k ( x ) h ( t | θ1 ) k ( x )

≤k

h ( t | θ0 ) h ( t | θ1 )

≤k

Hence, LRT or BCR is a function of sufficient statistic.

Likelihood Ratio Test ~ 9 of 9

Monotone Likelihood Ratio (MLR)

Monotone Likelihood Ratio (MLR)
A joint p.d . f . L ( x | θ ) is said to have a monotone likelihood ratio (MLR) in the statistic T = t ( x ) if for any two values of the parameter θ1 < θ 2 , the ratio a non-decreasing function of t ( x ) .

L ( x | θ2 ) L ( x | θ1 )

depends on X . Thoroughly the function t ( x ) and this ratio is

Example
Let X ~ b ( m, θ ) then we have

⎧ m ⎛ m ⎞⎫ ⎪ ⎪ x mn − x L ( x | θ ) = ⎨∏ ⎜ ⎟ ⎬θ ∑ i (1 − θ ) ∑ i xi ⎠ ⎪ ⎪ i =1 ⎝ ⎭ ⎩
If θ 2 > θ1 , then

L ( x | θ2 ) L ( x | θ1 )

=

mn − x ∑x θ 2 i (1 − θ 2 ) ∑ i x mn − x θ ∑ i (1 − θ ) ∑ i 1 1

⎛θ ⎞ =⎜ 2 ⎟ ⎝ θ1 ⎠

∑ xi

⎧ ⎫ ⎪ (1 − θ 2 ) ⎪ ⎨ ⎬ 1 − θ1 ) ⎪ ⎪( ⎩ ⎭
x

mn − ∑ xi

⎛ θ (1 − θ1 ) ⎞∑ i ⎧ (1 − θ 2 ) ⎫ ⎪ ⎪ =⎜ 2 ⎨ ⎬ ⎜ θ (1 − θ ) ⎟ ⎟ (1 − θ1 ) ⎭ ⎪ ⎪ 2 ⎠ ⎝ 1 ⎩
is a non-decreasing function of

mn

∑ xi , where ∑ xi = T = t ( x ) . Hence L ( x | θ ) has MLR in ∑ xi .

Uses
Distribution having MLR proving UMP test for testing simple H 0 against one sided H1 .

Example
Let X ~ exp (θ ) , then

f ( x) = 1

1

θ
θ

e

−x

θ

;

θ >0 , x>0

We have,

L(x |θ ) =

∑ xi
⎛ ∑ xi exp ⎜ − ⎜ θ2 ⎝ = ⎛ x 1 exp ⎜ − ∑ i ⎜ θ1 θ1n ⎝ 1
n θ2 n

θn

e

L ( x | θ2 ) L ( x | θ1 )

⎞ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎠

⎛ ⎛θ ⎞ ⎧θ − θ ⎫ ⎞ = ⎜ 1 ⎟ exp ⎜ −∑ xi ⎨ 2 1 ⎬ ⎟ ⎜ ⎟ ⎝ θ2 ⎠ ⎩ θ1θ 2 ⎭ ⎠ ⎝
For θ 2 > θ1 ,

L ( x | θ2 ) L ( x | θ1 )

is a non-decreasing function of

∑ xi . So that L ( x | θ ) has MLR in ∑ xi .
Monotone Likelihood Ration (MLR) ~ 1 of 10

Example
Let X ~ N (θ , 1) , then we have,

f ( x) =
We have,

2⎞ ⎛ 1 exp ⎜ − { x − θ } ⎟ 2 2π ⎝ ⎠

1

⎛ 1 ⎞ ⎛ 1 L(x |θ ) = ⎜ ⎟ exp ⎜ − 2π ⎠ ⎝ 2 ⎝
n

n

∑ { xi − θ }
2

2

⎞ ⎟ ⎠

L ( x | θ2 ) L ( x | θ1 )

⎛ 1 ⎞ ⎛ 1 ⎜ ⎟ exp ⎜ − 2 ⎝ ⎝ 2π ⎠ = n ⎛ 1 ⎞ ⎛ 1 ⎜ ⎟ exp ⎜ − 2 2π ⎠ ⎝ ⎝

∑ { xi − θ2 } ∑ { xi − θ1}

⎞ ⎟ ⎠ ⎞ ⎟ ⎠

2

n 2 1 ⎛ 1 xi2 + xiθ 2 − θ 2 + = exp ⎜ − 2 2 2 ⎝ n 2 ⎛ ⎞ = exp ⎜ xi (θ 2 − θ1 ) − θ 2 − θ12 ⎟ 2 ⎝ ⎠

∑ xi2 − ∑ xiθ1 + 2 θ12 ⎟ ⎠
n

(

)

which is a non-decreasing function of

∑ xi . So L ( x | θ ) has MLR in ∑ xi .

Example
Let x1 ,

, xn ~ U ( 0, θ ) , θ > 0 , then we have

f ( x) =

1

θ

The joint p.d . f . of x1 ,

, xn is

L(x |θ ) =

1

θn

;

0 ≤ max xi ≤ θ

Let θ 2 > θ1 and the ratio

L ( x | θ2 ) L ( x | θ1 )

⎛1⎞ ⎜ ⎟ I ( max xi ≤θ2 ) θ = ⎝ 1 ⎠n ⎛ 1 ⎞ ⎜ ⎟ I ( max xi ≤θ1 ) ⎝ θ2 ⎠
n ⎛ θ ⎞ I ( max xi ≤θ2 ) =⎜ 2 ⎟ ⎝ θ1 ⎠ I ( max xi ≤θ1 )

n

Let R ( x ) =

I ( max xi ≤θ 2 ) I ( max xi ≤θ1 )

⎧1 ⎪ =⎨ ⎪∞ ⎩

; ;

max xi ∈ [ 0, θ1 ] max xi ∈ [θ1 , θ 2 ] L ( x | θ2 ) L ( x | θ1 )

Define R ( x ) = ∞ if max xi > θ 2 . It follows that has an MLR in max xi .
1≤ i ≤ n

is a non-decreasing function of max xi and the L ( x | θ )
1≤ i ≤ n

Monotone Likelihood Ration (MLR) ~ 2 of 10

Theorem: The one-parameter exponential family

L ( x | θ ) = exp {Q (θ ) T ( x ) + S ( x ) + D (θ )}
where Q (θ ) is non-decreasing, has an MLR in T ( x ) .

Proof
For θ 2 > θ1 , Q (θ 2 ) > Q (θ1 ) and thus

L ( x | θ2 ) L ( x | θ1 )

= exp T ( x ) ⎡Q (θ 2 ) − Q (θ1 ) ⎤ + ⎡ D (θ 2 ) − D (θ1 ) ⎤ ⎣ ⎦ ⎣ ⎦

{

}

which is non-decreasing function in T ( x ) . Hence the exponential family has in MLR .

Example
Let X ~ c (1, θ ) then we have,

L ( x | θ2 ) L ( x | θ1 )

=

1 + ( x − θ1 )

2 2

1 + ( x − θ2 )

→1

as x → ±∞

So that c (1, θ ) does not have an MLR.

Theorem: If a joint p.d . f . L ( x | θ ) has MLR in the statistic T = t ( x ) then there exists a UMP test for testing H 0 : θ = θ 0
against H1 : θ > θ 0 .

Proof
We know that, for testing a simple H 0 : θ = θ 0 against a simple H1 : θ = θ1

( > θ0 )

there exists a BCR ω0 such that

L ( x | H0 )

L ( x | H1 )

≥ a constant

(1)

Since the ratio of the likelihood function is non-decreasing function of t ( x ) . For θ1 > θ 0 , the BCR determined by (1) is also given by

t ( x ) ≥ k1 if inside ω0
Let the size and power function of this test be α and P (θ ) respectively.

( 2)

P (θ 0 ) = α

The BCR for testing H 0 : θ = θ 0 against one H1 : θ = θ1

( > θ0 )

is given by

L ( x | θ2 ) L ( x | θ0 ) ⇒ t ( x ) ≥ k2

≥ a constant which is inside the C.R.

( 3)

If we take k1 = k2 in ( 3) , the CR obtained is identical with ω0 defined in ( 2 ) and is still most powerful for testing

θ = θ 0 against θ = θ 2 ( > θ 0 ) with size of the region α ′ = P (θ ′ ) . As the test is most powerful P (θ 2 ) > P (θ 0 ) .
Thus the power function P (θ ) is strictly increasing for P (θ ) < 1 . Therefore, for testing θ = θ 0 , the critical region defined by equation ( 2 ) can be used with size less than or equal to

α . The power of the test for nay alternative θ1 > θ0 is maximum and this is so for all alternatives greater than θ 0 .
Hence, the critical region given by ( 2 ) is a UMP for testing θ = θ 0 against θ > θ 0 .
Monotone Likelihood Ration (MLR) ~ 3 of 10

Example
Let X ~ N ( µ , 1) then we have

⎛ 1 ⎞ 2⎞ ⎛ 1 L ( x, µ ) = ⎜ ⎟ exp ⎜ − ∑ { xi − µ} ⎟ 2 ⎝ ⎠ ⎝ 2π ⎠
L ( x | µ1 ) 1 ⎛ 2 ⎞ = exp ⎜ nx ( µ1 − µ0 ) + µ12 − µ0 ⎟ 2 ⎝ ⎠

n

For µ1 > µ0 ,

L ( x | µ0 )

(

)

This is an increasing function of x . So there exist a UMP test for testing H 0 : µ = µ0 against H1 : µ ≥ µ0 .

Theorem: Suppose that x1 ,

, xn has joint p.d . f . of the form

f ( x ; θ ) = C (θ ) h ( x ) exp ⎡ q (θ ) l ( x ) ⎤ ⎣ ⎦
where q (θ ) is an increasing function of θ then there exists a UMP test.

Example
Consider a random sample of size n from Poisson population with parameter µ , then we have

L(x | µ) =

e − nµ µ ∑ i ∏ xi !
x

;
−1

xi = 0, 1,

= e − nµ where

(∏ xi !)

exp ⎡ln µ ∑ ⎢ ⎣

(

xi

)⎤⎥⎦
−1

q ( µ ) = ln µ C ( µ ) = e − nµ

t ( x ) = ∑ xi h ( x) =

(∏ xi !)

So there will exists a UMP test of size α for testing H 0 : µ = µ0 against H1 : µ > µ0 . So P ⎡T = ⎣

∑ xi ≥ k | H 0 ⎤ = α ⎦

Randomized Test
A test γ of a hypothesis H is defined to be a randomized test if γ is defined by the function

ψ γ ( x1 ,
The function ψ γ ( ⋅,

, xn ) = P ⎡ H is rejected | ( x1 , ⎣

, xn ) is observed ⎤ ⎦

, ⋅) is called the critical function of the test γ .
, xn be a random sample from f ( x ; θ ) = ϕθ , 25 ( x ) . Consider H : θ ≤ 17 . If γ be a test of

For example, let x1 ,

H is toss a coin and reject H iff head appears, then γ is a randomized test.

Non-Randomized Test
Let a test γ of a statistical hypothesis H be defined as follows: Reject H if and only if

( x1 ,

, xn ) ∈ cr , where cr is a subset of the sample space χ ; then γ is called a non-

randomized test and cr is called the critical region of the test γ .
Monotone Likelihood Ration (MLR) ~ 4 of 10

For example, let x1 ,

, xn be a random sample from f ( x ; θ ) = φθ , 25 ( x ) . χ is the Euclidean n space. Consider
test

H : θ < 17

and

the

γ : Reject H if and anly if x > 17 +
5 ⎫ ⎬. n⎭

5 n

,

then

γ

is

non-randomized

and

⎧ cr = ⎨( x1 , ⎩

, x1 ) : x > 17 +

Theorem: Let x1 ,

, xn be a random sample of size n from a p.d . f . f ( x ; θ ) which depends continuously is a single

parameter θ belongs to a parametric space Ω i.e. θ ∈ Ω . Let the likelihood function L ( x | θ ) have MLR in

T ( x ) = t ( x1 ,

, xn ) . Then for testing H 0 : θ = θ 0 against H1 : θ > θ 0 there exists a UMP test φ ( x1 , if T ( x1 , if T ( x1 , if T ( x1 , , xn ) > k , xn ) = k , xn ) < k

, xn ) of size

α given by
φ ( x1 ,
⎧1 ⎪ , xn ) = ⎨γ ⎪ ⎩0

; ; ;

where γ and k are non-negative constants satisfying Eθ0 ⎡φ ( x1 , ⎣

, xn ) ⎤ = α . ⎦

Proof
Since L ( x | θ ) has MLR in T ( x1 ,

, xn ) for any θ1 > θ0 and a constant k ,

⎧> k L ( x | θ1 ) ⎪ ⎨= k L ( x | θ0 ) ⎪ ⎩< k
is equivalent to T ( x1 ,

(1)

⎧> c ⎪ , xn ) ⎨= c for some constant c . ⎪< c ⎩

Hence by Neyman-Pearson lemma there exists a test

φ ( x1 ,

⎧1 ⎪ , xn ) = ⎨γ ⎪ ⎩0

; ; ;

if T ( x1 , if T ( x1 , if T ( x1 ,

, xn ) > k , xn ) = k , xn ) < k

which is most powerful of size α for testing H 0 : θ = θ 0 against any simple alternative provided θ1 > θ 0 . Furthermore, for any pair (θ ′, θ ′′ ) with θ ′ ≤ θ ′′ the test φ ( x1 ,

⎧1 ⎪ , xn ) = ⎨γ is most powerful for testing a simple ⎪0 ⎩

H 0 : θ = θ ′ against a simple H 0 : θ = θ ′′ for size α . So, if we find the power for this test then it will be more
powerful. Therefore, there exist a UMP test for testing θ = θ 0 against θ = θ1 > θ 0 .

Example
Let x1 ,
2 , xn be a random sample of size n drawn form N 0, σ 2 . Find the UMP test of size α for H 0 : σ 2 = σ 0 2 2 2 > σ 0 , where σ 0 is specified.

(

)

against H1 : σ

Solution
We have,

⎡ 1 ⎤ − 1 ∑ i2 L(x |σ ) = ⎢ ⎥ e 2 σ 2 ⎢ 2πσ ⎥ ⎣ ⎦
Monotone Likelihood Ration (MLR) ~ 5 of 10

n

x2

Now,

( ) L(x |σ )
L
2 x |σ2 2 1

⎛ 1 ⎜ 2 ⎜σ =⎝ 2 ⎛ 1 ⎜ 2 ⎜σ ⎝ 1 ⎛σ 2 = ⎜ 12 ⎜σ ⎝ 2

⎞ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎠

n

2

⎛ 1 exp ⎜ − ⎜ 2 ⎝ ⎛ 1 exp ⎜ − ⎜ 2 ⎝ ⎛1 exp ⎜ ⎜2 ⎝

xi2 ⎞ 2 ⎟ σ2 ⎟ ⎠ x2 ⎞
1

n

2

∑ σi2 ⎟ ⎟
⎠ 1 ⎤⎞
2

n

2

⎟ ∑ xi2 ⎢σ 2 − σ 2 ⎥ ⎟ ⎢ ⎥ ⎣
1

⎡ 1

⎦⎠

which is a non-decreasing function of

∑ xi2
; ; ;

for σ 2 > σ 1 . So there exist a MLR. So that
2 2

φ ( x1 ,

⎧1 ⎪ ⎪ , xn ) = ⎨γ ⎪ ⎪0 ⎩

if if if

∑ xi2 > c ∑ xi2 = c ∑ xi2 < c

where γ and c are constant. Now, we have to find out the value of c , we have,

⇒ ⇒

∑ xi2 ≥ c | H 0 ⎤ = α ⎦ 2 ⎡ ∑ xi ⎤ c ≥ | H0 ⎥ = α P⎢
P⎡ ⎣
2 ⎢ σ0 ⎣ 2 σ0

⎥ ⎦

⎡ ⎤ c P ⎢χ 2 ≥ 2 | H0 ⎥ = α σ0 ⎢ ⎥ ⎣ ⎦

Thus

c
2 σ0

may be real from the table and c determined. If

∑ xi2 ≥ c , then

H 0 is rejected at the significance level

α , otherwise H 0 is accepted.
If we take n = 10, α = 0.05, σ 0 = 2 then from χ
2 2

table we have,

c
2 σ0

= 18.307

If

c = 18.307 × 2 = 36.614

∑ xi2 ≥ 36.614 then we reject H 0 : σ 02 = 2 , otherwise we accept H 0 .

Theorem: Let f ( x | θ ) be a continuous density function of a random variable x . If the likelihood function L ( x ; θ ) of n
independent observation is differentiable with respect to θ under the sign of integration, the derivative L′ ( x ;θ ) of

L ( x ; θ ) with regard to θ is everywhere continuous in θ and does not vanish identically the sub-space and for
testing a sample H 0 : θ = θ 0 defining the family of alternatives, there does not exist a UMP test for both negative and positive values of (θ − θ 0 ) .

Proof
Let H1 : θ = θ1 be a simple alternatives. The likelihood functions under H 0 and H1 are L ( x | H 0 ) and L ( x | H1 ) respectively. Then expanding L ( x | H1 ) about θ = θ 0 by Taylor series

L ( x | θ1 ) = L ( x | θ 0 ) + (θ1 − θ 0 ) L′ ( x | θ ′ )
where θ ′ be some value of θ in the interval (θ1 , θ 0 ) .

(1)

Monotone Likelihood Ration (MLR) ~ 6 of 10

Let us assume that there exists a UMP test for testing H 0 : θ = θ 0 against H1 : θ ≠ θ 0 . According to NeymanPearson Lemma, the BCR for testing H 0 : θ = θ 0 against H1 : θ = θ1 is

L ( x | θ0 )

L ( x | θ1 )

≥ k (θ1 )

within the C. R.

( 2)

Here k depends on α and sample size. But here fix k and we assume k will depend on θ1 only. Now from (1) we have,

L ( x | θ0 ) ⇒

L ( x | θ1 )

= 1 + (θ1 − θ 0 ) L ′ ( x | θ1′ ) L ( x | θ0 )

L′ ( x | θ1′ ) L ( x | θ0 ) ⎡by ( 2 ) ⎤ ⎣ ⎦

1 + (θ1 − θ0 )

≥ k (θ1 )

( 3)

When θ1 = θ 0 , we can write k (θ 0 ) = 1 . Therefore, we can expand k (θ1 ) about θ 0 using again by Taylor’s series

k (θ1 ) = 1 + (θ1 − θ0 ) k ′ (θ ′′ )
Using ( 3) and ( 4 ) we have,

( 4)

where θ 0 < θ ′′ < θ1

1 + (θ1 − θ 0 ) ⇒

L′ ( x | θ1′ ) L ( x | θ0 )

≥ 1 + (θ1 − θ 0 ) k ′ (θ ′′ )

(θ1 − θ0 ) ⎢

⎢ L ( x | θ0 ) ⎣

⎡ L′ ( x | θ1′ )

⎤ − k ′ (θ ′′ ) ⎥ ≥ 0 ⎥ ⎦

( 5)

If x denotes the point on the boundary of the BCR defined by equation ( 2 ) then,

L ( x | θ0 )
So that , Similarly

L ( x | θ1 )

= k (θ1 )
= k ′ (θ1 ) = k ′ (θ ′′ )

L′ ( x | θ1 ) L ( x | θ0 ) L′ ( x | θ ′′ ) L ( x | θ0 )

[by [by

differentiating w. r. to θ1 ] differentiating w. r. to θ1 ]

Substituting the value of k ′ (θ ′′ ) in ( 5 ) we have,

(θ1 − θ0 ) ⎢

⎡ L′ ( x | θ ′ ) ⎢ L ( x | θ0 ) ⎣

L′ ( x | θ ′′ ) ⎤ ⎥≥0 L ( x | θ0 ) ⎥ ⎦

(6)

For the C. R. to be UMP must hold good for all θ . Therefore, ( 6 ) must be true identically for all values of θ1 , x and x within the BCR. Since (θ1 − θ 0 ) can assume both positive and negative values and for all positive and negative values and for all these values ( 6 ) must hold good, the expression ⎢

⎡ L′ ( x | θ ′ ) ⎢ L ( x | θ0 ) ⎣

L ′ ( x | θ ′′ ) ⎤ ⎥ must vanish within the BCR. L ( x | θ0 ) ⎥ ⎦

The outside of the BCR in equation ( 2 ) is defined by

L ( x | H0 )

L ( x | H1 )

< k (θ1 )

(7)
Monotone Likelihood Ration (MLR) ~ 7 of 10

With the help of the same arguments which lead from

( 2)

to

(6)

we see from

(7)

and

(6)

with inequality sign

reversed is true for both positive and negative values of

(θ1 − θ0 )

outside the BCR and hence the expression

⎡ L′ ( x | θ ′ ) L′ ( x | θ ′′ ) ⎤ − ⎢ ⎥ is zero outside the BCR also. ⎢ L ( x | θ0 ) L ( x | θ0 ) ⎥ ⎣ ⎦
L′ ( x | θ ′ ) L ( x | θ0 ) L′ ( x | θ ′′ ) L ( x | θ0 )

Thus,

= 0 throughout the sample space that is

L′ ( x | θ ′ ) L ( x | θ0 )

=

L′ ( x | θ ′′ ) L ( x | θ0 )

Therefore, since L′ ( x | θ 0 ) is continuous in θ we get,

L′ ( x | θ 0 ) L ( x | θ0 )

=

∂ ln L ( x ;θ ) ⎤ ⎥ ∂θ ⎦θ =θ0

is a constant and this is the essential condition for the existence of a UMP test for the two sided alternatives. We have,

∫ L ( x | θ ) dx = 1
S

Since differentiation under the sign of integration is valid this leads to

∂ ln L ( x ;θ ) ⎤ L ( x | θ 0 ) dx = 0 ⎥ ∂θ ⎦θ =θ0

This result is similar with the earlier result. i.e.

L′ ( x | θ 0 ) L ( x | θ0 )

=0

L ( x | θ0 ) = 0

Identically in S (sample space) and this is a contradiction. Hence the theorem.

Example
Let us consider f ( x | θ ) = e
−( x −θ )

for testing H 0 : θ = θ 0 against two sided alternative.

Here

− ( x −θ ) L(x |θ ) = e ∑ i

⇒ ⇒

ln L ( x | θ ) = − ∂ ln L ( x | θ ) ∂θ

∑ ( xi − θ )

=n

Here n is constant, so there exist a UMP test.
Monotone Likelihood Ration (MLR) ~ 8 of 10

Since the lower point of the range of integration depends on the parameter µ , the smallest observation x1 in the sample is sufficient for µ . Therefore, the probability that x1 < µ1 is zero. Thus

L′ ( x | H 0 ) L ( x | H0 )

⎧∞ ⎪ = ⎨ n( µ − µ ) ⎪e 0 1 ⎩

; ;

x1 < µ1 otherwise

That is e

n( µ0 − µ1 )

≤k

(1)

Determine the BCR where k is so chosen as to make its size equal to α . The left hand side of (1) is a constant and is therefore independent of the observations. Hence (1) will be satisfied by every C. R. of size α with x1 ≥ µ1 . Thus every such C. R. is of equal power and is therefore a BCR. If we permit µ1 to take values greater or less than µ0

L ( x | H0 ) L ( x | H1 )

; ⎧∞ ⎪ n( µ0 − µ1 ) <1 ; ⎪e =⎨ n ( µ0 − µ1 ) >1 ; ⎪e ⎪ 0 ; ⎩

µ0 ≤ x1 < µ1
x1 ≥ µ1 > µ0 x1 ≥ µ0 > µ1

µ1 ≤ x1 < µ0

The BCR is therefore given by

( x1 − µ0 ) < 0

,

( x1 − µ0 ) > c1

When H 0 hold probability that ( x1 − µ0 ) < 0 is zero, the value of c1 is so chosen as to satisfy the condition

P {( x1 − µ0 ) > c1 | H 0 } = α
This C. R. is BCR for all alternatives µ1 ≠ µ0 and is therefore UMP with respect to these alternatives.

Example
Examine whether a UMP test exists for testing H 0 : θ = θ 0 , σ = σ 0 in N θ , σ 2 .

(

)

Solution
If H1 : θ = θ1 , σ = σ 1 is any simple alternative then

L ( x | H0 )

L ( x | H1 )

n ⎡ ⎛σ ⎞ 1⎧ ⎪ = ⎜ 0 ⎟ exp ⎢ − ⎨ ⎢ 2⎪ σ1 ⎠ ⎝ ⎩ ⎣

∑ ( xi − θ1 ) − ∑ ( xi − θ0 )
2

2

σ 12

2 σ0

⎫⎤ ⎪⎥ ⎬ ≥k ⎥ ⎪⎦ ⎭

Monotone Likelihood Ration (MLR) ~ 9 of 10

This may be written as
n 2 2 ⎞ ( x − θ1 ) ( x − θ0 ) 2 ⎡1 ⎛ σ ⎞ ⎤ − ≤ ln ⎢ ⎜ 1 ⎟ ⎥ ⎟+ 2 ⎟ n ⎢ k ⎝ σ0 ⎠ ⎥ σ12 σ0 ⎠ ⎣ ⎦ ⎛ 1 ⎛ 1 ⎛θ θ ⎞ 1 ⎞ 1 ⎞ S 2 ⎜ 2 − 2 ⎟ + x 2 ⎜ 2 − 2 ⎟ + 2 x ⎜ 0 − 1 ⎟ ≤ constant ⎜σ ⎟ ⎜σ ⎟ ⎜σ2 σ2 ⎟ 1 ⎠ ⎝ 1 σ0 ⎠ ⎝ 1 σ0 ⎠ ⎝ 0 2 ⎛ 1 ⎡ θ0σ12 − θ1σ 0 ⎤ 1 ⎞ 2 2 ⎜ 2 − 2 ⎟ S + { x − δ } ≤ constant ⎢δ = ⎥ 2 ⎜σ ⎟ σ 0 σ12 ⎦ ⎢ ⎥ ⎝ 1 σ0 ⎠ ⎣

⎛ 1 1 S2 ⎜ 2 − 2 ⎜σ ⎝ 1 σ0

⇒ ⇒ ⇒

(

)

2 0

− σ12

)∑(x −δ )
i

2

≤ constant

This means that if σ 0 > σ 1 , the BCR is bounded by a hyper sphere centered at

(δ ,

, δ ) where δ itself is

dependent on H1 . When σ 1 > σ 0 , the BCR lies outside this sphere. In both cases the BCR changes with the alternative.

Therefore, there does not exist ant UMP test for any set of alternatives.

Example
Examine for what values of λ there exists a UMP test for H 0 : µ = µ0 , λ = λ0 in the distribution

f ( x ; µ, λ ) =

1

λ

e

1

λ

( x−µ )

;

µ≤x≤∞

Solution
If H1 : µ = µ1 , λ = λ1 is any simple hypothesis and if x =

1 ∑ xi then n

L ( x | H0 )

L ( x | H1 )

⎡ ⎧ x − µ0 x − µ1 ⎫⎤ ⎛λ ⎞ = ⎜ 0 ⎟ exp ⎢ n ⎨ − ⎬⎥ λ1 ⎭⎦ ⎢ ⎥ ⎝ λ1 ⎠ ⎣ ⎩ λ0 ⎡ ⎧1 1⎫ ⎛ µ µ ⎞⎤ ⎛λ ⎞ = ⎜ 0 ⎟ exp ⎢ nx ⎨ − ⎬ + n ⎜ 1 − 0 ⎟ ⎥ ⎢ ⎩ λ0 λ1 ⎭ ⎝ λ1 ⎠ ⎝ λ1 λ0 ⎠ ⎥ ⎣ ⎦
n

n

The BCR for the above H 0 and H1 is defined by the relation

⎧1 1⎫ x ⎨ − ⎬ ≥ Constant ⎩ λ0 λ1 ⎭
Thus, UMP tests exist separately for λ1 > λ0 and λ1 < λ0 irrespective of the value of µ1 .

Monotone Likelihood Ration (MLR) ~ 10 of 10