Professional Documents
Culture Documents
Point estimation:
A statistic Θ̂ = Θ̂(X1 , X2 , · · · , Xn ) is called a estimator of θ if the observed values of Θ̂ or the
distribution of Θ̂ are/is able to approximate θ as good as possible (mathematically we say that Θ̂
has central tendency at θ). For a certain observation (x1 , x2 , · · · , xn ) of the sample the value of Θ̂
denoted by θ̂(x1 , x2 , · · · , xn ) is called an estimate or point estimate of Θ̂.
Example. For example, the value X of the statistic X, computed from a sample of size n, is a
point estimate of the population parameter µ. Similarly, p̂ = x/n is a point estimate of the true
proportion p for a binomial experiment.
Caution: An estimator is not expected to estimate the population parameter without error. We
do not expect X to estimate µ exactly, but we certainly hope that it is not far off.
(II) Consistency: A point estimator Θ̂ = Θ̂(n) (which is obviously a function of n, where n is the
sample size) for the parameter θ is consistent if for every ε > 0 the following holds: P (|Θ̂(n) − θ| >
ε) → 0 as n → ∞.
Example. The sample mean X is a consistent estimator of the population mean µ. (Use Cheby-
shev’s inequality to prove it). The sample variance S 2 is a consistent estimator of the population
(n − 1)S 2
variance σ 2 (Use the fact that ∼ χ2 (n − 1) and use Chebyshev’s inequality to prove it).
σ2
Theorem. If Θ̂ = Θ̂(n) is a estimator of θ such that E(Θ̂(n)) → θ and V ar(Θ̂(n)) → 0 as n → ∞
then Θ̂ is a consistent estimator of θ.
1
Application of the theorem: X is a consistent estimator of the sample mean µ, since E(X) = µ
and var(X) = σ 2 /n → 0 and n → ∞.
2 ≤ σ 2 . Note
(III) Efficiency: A point estimator Θ̂1 is called more efficient than a statistic Θ̂2 if σΘ̂ Θ̂ 1 2
that if Θ̂1 and Θ̂2 are unbiased then Θ̂1 is more efficient than Θ̂2 if M SE(Θ̂1 ) ≤ M SE(Θ̂2 ).
Remark: According to Fisher consistency is not enough to say that an unbiased estimator Θ̂
is good. It is also needed that the estimator should converge to the corresponding parameter as
speedily as possible. Efficiency of estimator determines how speedily it converges to the parameter.
More efficient estimator converges more speedily.
Corollary. If Θ̂ is a sufficient estimator for θ, then aΘ̂ is also a sufficient estimator for θ where
a is a constant.
Example (Contd). 1. X1 + X2 + · · · + Xn and X are sufficient statistics for µ in a N (µ, σ 2 )-
population.
However, the most important and widely used method of estimation is the maximum likelihood
method and resulting estimators are known as maximum likelihood estimator (m.l.e.).
2
We call this function L(θ) as likelihood function. Our problem is to find the value of θ, say θ̂,
which makes likelihood function L(θ) a maximum. Obviously θ̂ will be a function of (x1 , x2 , · · · , xn )
(i.e., θ̂ = θ̂(x1 , x2 , · · · , xn )). Thus the corresponding maximum likelihood estimator will be Θ̂ =
θ̂(X1 , X2 , · · · , Xn ).
Pn
Taking logarithm in the both sides of the above equality we get loge L(θ) = loge f (xi , θ).
i=1
So, to maximize L(θ) it is enough to maximize loge L(θ). Thus if θ̂ is the m.l.e. of θ, then
∂
loge L(θ̂) = M ax loge L(θ). Hence, θ̂ can be obtained by taking loge L(θ) = 0 which will yield
∂θ
∂2
θ = θ̂(x1 , x2 , · · · , xn ) such that loge L(θ)|θ=θ̂ < 0.
∂θ2
Remark:
1. If θ is a vector (θ1 , θ2 , · · · , θs ), then we need to consider s equations
∂ ∂ ∂
loge L(θ) = loge L(θ) = · · · = loge L(θ) = 0 (called likelihood equations).
∂θ1 ∂θ2 ∂θs
2. In some cases, the derivative may not exists at θ = θ̂ and then the usual method of differ-
entiation for finding m.l.e fails. In such cases we have to adopt some other method like algebraic
method using some inequality.
3. m.l.e.s are generally sufficient estimators or function of sufficient estimators, if exist.
4. m.l.e.s are consistent estimators.
5. m.l.e.s are not, in general, unbiased. But they can be made unbiased by some simple
modifications.
1
6. The variance of m.l.e. θ̂ is given by var(θ̂) = ∂2
(Cramer-Rao’s inequality).
−nE[ ∂θ2 loge f (x, θ)]
7. If m.l.e.s are unbiased, then they can be seen to be most efficient.
8. m.l.e.s processes invariance property in the sense that if θ̂ is the m.l.e. of θ, then g(θ̂) will
be the m.l.e. of g(θ).
Interval estimation:
Till now we have discussed about the criterion which good point estimator should satisfy and
the method(s) of obtaining such point estimators. We may, however, note that any single point
estimator, however satisfactory it may be, is seldom likely to give us an idea about the error of
the estimate. It is true that estimation accuracy increases with large samples, but there is still
no reason we should expect a point estimate (giving a single value only!) from a given sample
to be exactly equal to the population parameter it is supposed to estimate. It is therefore pro-
posed to find an interval in which the population parameter may be expected to lie (on the basis of
random samples) with a specified degree of confidence. This is the problem of estimation by interval.
Suppose we have given a f (x, θ) variate population where θ is a parameter. The problem of
interval estimation of θ consists of the following:
Let 0 ≤ α ≤ 1. Choosing a sample (X1 , X2 , · · · , Xn ) of size n from the population we need to
find two statistics Θ̂L and Θ̂U where Θ̂L < Θ̂U on every observations such that
3
On basis of a particular observation (x1 , x2 , · · · , xn ) of the sample, the computed interval [θ̂L , θ̂U ]
is called a 100(1 − α)% confidence interval, the fraction 1 − α is called the confidence coefficient or
the degree of confidence, and the endpoints θ̂L and θ̂U are called the lower and upper confidence
limits respectively. For a particular observation we say that one may be (100)α% confident that
the value of the parameter θ will lie within the observed interval [θ̂L , θ̂U ].
Remarks:
1. The interval [Θ̂L , Θ̂U ] is a random interval.
2. The interpretation of the above equality is: Whatever the true value of θ be, the proba-
bility that θ will lie in the random interval is 1 − α. In other words for repeated observations
(x1 , x2 , · · · , xn ) of the random sample (X1 , X2 , · · · , Xn ) if we compute the values of Θ̂L and Θ̂U ,
say, θ̂L and θ̂U , respectively, then in about 100(1−α)% of the cases, the interval [θ̂L , θ̂U ] will contain
θ and in 100α% cases it will fail to do so, i.e., if α is very small, then the probability that [θ̂L , θ̂U ]
will contain θ is very high.
Note: Observe that δ1 (Θ̂) and δ2 (Θ̂) are random variables and hence [δ1 (Θ̂), δ2 (Θ̂)] is a random
interval and that P (θ1 ∈ [δ1 (Θ̂), δ2 (Θ̂)]) = P (Ψ(Θ̂, θ1 ) ∈ [λ1 , λ2 ]) = 1 − α.
4
Figure 1: P (Z ∈ [−zα/2 , zα/2 ]) = 1 − α
σ σ
get that P (X − zα/2 √ ≤ µ ≤ X + zα/2 √ ) = 1 − α. Thus the 100(1 − α)% confidence interval
n n
σ σ
for mean µ for known variance and for large sample is [X − zα/2 √ , X + zα/2 √ ] where X is a
n n
realization of X.
Here we need to assume that the population approximately follows Normal distribution with
X −µ
mean µ. Since variance is unknown we choose Ψ(X, µ) = √ which, obviously, is independent of
S/ n
X −µ
any other parameters except µ, and of which we know the distribution: Ψ(X, µ) = √ ∼ t(n−1)
S/ n
which is obviously independent of all parameters, where S 2 is the sample variance. Using similar
arguments we can show that the 100(1 − α)% confidence interval of µ when variance is unknown is
s s
[X − tα/2 √ , X + tα/2 √ ] where s is a realization of S and tα/2 is the upper 100(α/2)% point of
n n
the t-distribution with n − 1 degrees of freedom.
5
interval. We have the following results to measure this error:
Theorem ( Variance is known and sample is large ). If X is used as an estimator of µ, then for
an observation from a sample of size n, the probability that error will not exceed zα/2 √σn is 1 − α.
i.e., If x is used as an estimate of µ, then one can be 100(1 − α)% confident that the error will
not exceed zα/2 √σn .
Theorem ( Variance is unknown and sample is large ). If X is used as an estimator of µ, then for
an observation from a sample of size n, the probability that error will not exceed zα/2 √sn is 1 − α.
Sample size needed for a specified error: The following results are straightforward from the
above theorems.
Corollary ( Variance is known and sample is large ). If X is used as an estimator of µ, then for
an observation the error will not exceed a specified amount e with probability 1 − α if the sample
z σ 2
α/2
has size n = .
e
i.e., If x is used as an estimate of µ, then one can be 100(1 − α)% confident that the error will
z σ 2
α/2
not exceed a specified amount e if the sample size is n = .
e
Corollary ( Variance is unknown and sample is large ). If X is used as an estimator of µ, then
for an observation the error will not exceed a specified amount e with probability 1 − α if the sample
z s 2
α/2
has size n = .
e
One sided confidence intervals for µ: Similar construction as before shows that one can have
following types of 100(1 − α)% confidence intervals for µ
(I) When variance is known and sample is large:
σ σ
1. (−∞, x + zα √ ]; 2. [x − zα √ , ∞).
n n
( Sketch of the proof: Start with
X−µ X−µ
1. P (−zα < σ/ √
n
< ∞) = 1 − α; 2. P (−∞ < σ/ √
n
< zα ) = 1 − α )
(II) When sample is small:
s s
1. (−∞, x + tα √ ]; 2. [x − tα √ , ∞).
n n
(III) When variance is unknown and sample is large:
s s
1. (−∞, x + zα √ ]; 2. [x − zα √ , ∞).
n n
(1 − 2α)
Remark: Similar construction as before shows that one can have following types of 100 %
2
confidence intervals for µ
(I) When variance is known and sample is large:
σ σ
1. [x, x + zα √ ]; 2. [x − zα √ , x].
n n
(II) When sample is small:
s s
1. [x, x + tα √ ]; 2. [x − tα √ , x].
n n
(III) When variance is unknown and sample is large:
s s
1. [x, x + zα √ ]; 2. [x − zα √ , x].
n n
6
Example (2). Interval estimation of the proportion p of a population:
Suppose we are given a binomial/Bernoulli population with proportion p and we have chosen
a random sample (X1 , X2 , · · · , Xn ) from b(1, p). By some standard method of point estimation we
Pn
Xi
i=1
know that a good point estimator of the proportion p is given by the statistic P̂ = = X
n
X
(which many authors write as where X counts number of success in n independent trials).
n
Denote p̂ for an observation of P̂. If the unknown proportion p is not expected to be too close to 0
or 1, we can establish a confidence interval for p by considering the sampling distribution of P̂. So
we suppose that p is neither close to 0 nor close to 1.
n
Xi ∼ b(n, p) and hence µP̂ = p and σP̂2 = p(1 − p)/n. Consider
P
Since each Xi ∼ b(1, p),
i=1
P̂ − p
the function Ψ(P̂, µ) = p which, obviously, depends only upon p. Now, suppose n is
p(1 − p)/n
sufficiently large to apply Central Limit Theorem on Ψ(P̂, µ) to get Ψ(P̂, µ) ∼ N (0, 1) (approxi-
mately), which is free of any parameter. Thus as in the previous case we can choose zα/2 , the upper
P̂ − p
100(α/2)% point of standard normal distribution, such that P (−zα/2 ≤ p ≤ zα/2 ) =
p(1 − p)/n
1 − α where α is given. When n is very large, for an observation (x1 , x2 , · · · , xn ), p̂ = X will be very
close to the original value of p and hence for sufficiently large n we may introduce very small error
(replacing p by p̂)" to get, from
r the above equality, r approximately
# " a 100(1 r− α)% confidence r #interval
p̂(1 − p̂) p̂(1 − p̂) p̂q̂ p̂q̂
of p in the form p̂ − zα/2 , p̂ + zα/2 or p̂ − zα/2 , p̂ + zα/2 where
n n n n
r r
p̂q̂ p̂q̂
q̂ = 1 − p̂, i.e., P (P̂ − zα/2 ≤ p ≤ P̂ + zα/2 ) ≈ 1 − α.
n n
If we don’t use the approximation then solving quadratic equation we can get a 100(1 − α)%
confidence interval for p insthe following form: s
2 2
zα/2 2 zα/2 2
p̂ + 2n
zα/2 p̂q̂ zα/2 p̂ + 2n
zα/2 p̂q̂ zα/2
2 − 2 + , 2 + 2 + .
zα/2 zα/2 n 4n2 zα/2 zα/2 n 4n2
1+ n 1+ n 1+ n 1+ n
Maximum error in estimating p by P̂:
Theorem. If P̂ is used as an estimator of r p, then for an observation from a sample of size n, the
σ
probability that error will not exceed zα/2 is 1 − α.
n
Sample size needed for a specified error: The following are straightforward form the previous
theorem.
Corollary. If P̂ is used as an estimator of p, then for an observation the error will be less than a
2 p̂q̂
zα/2
specified amount e with probability 1 − α if the sample has size approximately n = .
e2
Corollary. If P̂ is used as an estimator of p, then for an observation the error will not exceed a
2
zα/2
specified amount e with probability, at least, 1 − α if the sample has size n = .
4e2
(Proof follows from previous corollary putting p̂ = q̂ = 1/2.)
7
We choose a sample (X1 , X2 , · · · , Xn ) of size n from the population. It is well known that S 2 is
(n − 1)S 2
a good point estimator of σ 2 . We set Φ(S 2 , σ 2 ) = which, obviously, depends only upon
σ2
(n − 1)S 2
the parameter σ 2 . By a theorem in sampling distribution we know that ∼ χ2 (n − 1).
σ2
From χ2 distribution table we can find χ2α/2 , the upper 100(α/2)% point of χ2 distribution, such
(n − 1)S 2
that P (Φ(S 2 , σ 2 ) ∈ [χ21−α/2 , χ2α/2 ]) = 1 − α, i.e., P (χ21−α/2 ≤ ≤ χ2α/2 ) = 1 − α. From this
σ2
(n − 1)S 2 2 (n − 1)S 2
we get ≤ σ ≤ .
χ2α/2 χ21−α/2