You are on page 1of 103

Ch05

Statistical Estimation
CHAPTER CONTENTS

CHAPTER CONTENTS
5.1 Introduction 220
5.2 The Methods of Finding Point Estimators 221
5.3 Some Desirable Properties of Point Estimators 245
5.4 A Method of Finding the Confidence Interval: Pivotal Method 261
5.5 One Sample Confidence Intervals 269
5.6 A Confidence Interval for the Population Variance 284
5.7 Confidence Interval Concerning Two Population Parameters 289
5.8 Chapter Summary 298
5.9 Computer Examples 299
Projects for Chapter 5 303
5.1 Introduction

Unknown population parameters

To estimate:
point estimation
interval estimation

How much money does a person has in his/her pocket?

 1000 $
 (700, 1200)
5.2The Methods of Finding Point Estimators
X1, . . ., Xn independent and identically distributed
(iid) random variables (in statistical
language, a random sample)
f (x, 1, . . ., l) Pdf or pmf of the population(?)
(1, . . ., l) the unknown population parameters

Point estimation :
to determine statistics gi(X1, . . ., Xn), i = 1, . . ., l,
which can be used to estimate the value of each of the
parameters

Estimator of i :  i  1,, l 
ˆ  g  X ,, X 
i i 1 n
Capital letters such as X and S2 to represent the estimators;
Lowercase letters such as x and s2 to represent the estimates.

Three of the more popular methods of estimation

the method of moments This chapter

the method of maximum This chapter

likelihood
Bayes’ method Chapter 11
Unbiased
An estimator, ˆ, is unbiased if the mean of its sampling distribution is the parameter  .
Bias


consistency The estimator are said to satisfy the consistency property if the sample estimator has a high
E ˆ 
probabilityBofbeing  to the population value  for a large sample size.
close
efficiency smaller variance
5.2.1 THE METHOD OF MOMENTS

: the kth population moment about the origin of a random variable X,

: the kth sample moment of the random variable X

k = E[(X - )k]
5.2.2 THE METHOD OF MAXIMUM LIKELIHOOD

Even though the method of moments is intuitive and easy to apply, it usually does
not yield “good” estimators.

The method of maximum likelihood is intuitively appealing, because we attempt


to find the values of the true parameters that would have most likely produced
the data that we in fact observed.

For most cases of practical interest, the performance of MLEs is optimal for large
enough data.

This is one of the most versatile methods for fitting parametric statistical models
to data.
Maximum likelihood estimates give the parameter values for which the
observed sample is most likely to have been generated.
At times, the MLEs may be hard to calculate. It may be necessary to use numerical methods to
approximate values of the estimate.
5.3 Some Desirable Properties of Point Estimators
5.3.1 UNBIASED ESTIMATORS
The sample mean is always an unbiased estimator of the population mean.
Sample variance

Population variance:

Size of population = N
N
  (1 / N ) ( X i   )
Elements of population: X1, X2, ,… , XN
2 2

i 1
Unbiased estimators need not be unique.

If we have two unbiased estimators, there are infinitely many unbiased estimators.

It is better to have an estimator that has low bias as well as low variance.
For unbiased estimators,
5.3.2 SUFFICIENCY*

Skipped
5.4 A Method of Finding the Confidence Interval: Pivotal Method

Interval
Purpose: estimation
to have some degree of confidence of securing the true parameter.

Estimator: Confidence interval: (Lower confidence limit, upper confidence limit)


Advantage: Convey more information about the data that are used to obtain the point estimate.
Provide a measure of one’s confidence in the accuracy of the estimate.
The width of the confidence interval reflects the amount of variability inherent in the
point estimate.

Desirable properties: (1) P(L<<U) is high, that is, the true parameter  is in (L, U) with high probability,
and (2) the length of the interval (L, U) should be relatively narrow on the average.

Thus, our objective is to find a narrow interval with high probability of enclosing the
true parameter, .
Randomness: For an interval estimator of a single parameter , we will use the random sample to
find two quantities L and U such that L<  <U with some probability. Because L and U
depend on the sample values, they will be random.
2.5% 95% 2.5%

? 13.8 ?
The previous example (Example 5.4.1) was built on our knowledge of the
sampling distribution of the sample mean.

What if the sampling distribution of the statistic we are interested in is not


readily available?

More generally, our success in building confidence intervals for an


estimate of a parameter depends on identifying a quantity known as the
pivot. We now describe this method.
The pivotal method is a general method of constructing a confidence interval using a pivotal
quantity. This relies on our knowledge of sampling distributions.
Here we have to find a pivotal quantity with the following two characteristics:
(i) It is a function of the random sample (a statistic or an estimator θ_hat) and the unknown
parameter θ, where θ is the only unknown quantity, and
(ii) It has a probability distribution that does not depend on the parameter θ.

 Suppose that = (X) is a point estimate of y, and let p(, ) be the pivotal
quantity.
Now, for a given value of α, (0<α<1), and constants a and b, with
(a<b), let

Hence, given , the inequality is solved for  to obtain a region of 


values, which will be a desired confidence interval.
X_bar follows N(, 2/n).
------- It depends on , so X_bar cannot be a pivot.
: 
The z transform of x_bar follows (0,1).
Pivot: Z
------- It doesn't depend on , so it can be a pivot.

Pivot: z transform of 

 Pivot : Z


/2 /2
FIGURE 5.5 Probability density of the pivot.
The pivotal method may not be applicable in all situations.
The meaning of 95% C. I.

In an infinite series of trials in which repeated samples of size n are drawn from the same
population and 95% confidence intervals for m are calculated by the same method for
each of the samples, the proportion of intervals that actually include  will be 95%.

FIGURE 5.4 95% confidence intervals for μ.


5.5 One Sample Confidence Intervals

Pivot : Z

/2 /2
5.5.1 LARGE SAMPLE CONFIDENCE INTERVALS

If the sample size is large, then by the Central Limit Theorem, certain sampling
distributions can be assumed to be approximately normal.

For  = , n ≥ 30 will be considered large.


For the binomial parameter p, n is considered large if np, and n(1 - p) are both greater
than 5.
That is, if θ is an unknown parameter (such as µ, p, (µ1- µ2), (p1 - p2)), then
for large samples, by the Central Limit Theorem, the z-transform

possesses an approximately standard normal distribution, where ^θ is the


MLE of θ and σ^θ is its standard deviation.
Pivot : Z

/ /
2 2
It is correct to say “We are 95% confident that the true mean will lie in the interval (74.1, 79.8).”
5.5.2 CONFIDENCE INTERVAL FOR PROPORTION, p
Question: “How do we determine the sample size that we have is sufficient
for the normal approximation that is used in the foregoing formula?”

Some popular rules:


*) np and n(1–p) should be greater than 10,
*)

should be contained in the interval (0,1), or


*) np(1-p)>10, etc.

Note: All of these rules perform poorly when p is nearer to 0 or 1.


5.5.2.1 MARGIN OF ERROR AND SAMPLE SIZE
In real-world problems, the estimates of the proportion p are usually accompanied by
a margin of error, rather than a confidence interval.

“The CNN/USA Today/Gallup poll of 818 registered voters taken on June 27-30 showed that
if the election were held now, the president would beat his challenger 52% to 40%, with 8%
undecided. The poll had a margin of error of plus or minus 4%.”

民進黨台南市黨部今日公布由黨中央所做的市長民調,黃偉哲以 39.9% 領先高思博、林義豐的 14.7% 和


14.6% 。
這份民調調查日期為 9 月 13 至 14 日,完成數為 1259 名 20 歲以上具有投票權公民,在 95% 的信心水準下
,抽樣誤差約為正負 2.8% 。
Taiwan DPP announced a public poll result of Tainan City mayor election. It shows that the three candidates Huang
Weizhe, Gao Sibo, and Lin Yifeng obtained 39.9%, 14.7%, and 14.6% of supporting rates, respectively.
The polling date is September 13-14. Totally 1259 citizens over 20 years were polled. At 95% confidence level, the
margin of error is about plus or minus 2.8%.

Q: What is the minimum number n to attain a 5% margin of error under 95% confidence level?
Q: If 500 citizens were polled, what will be the margin of error under 95% confidence level.
The margin of error is nothing but a confidence interval.
2.5% 95% 2.5%

35% 38% 41%


A candidate has 38% supporting rate, with 95% confidence leve and 3% margin of error, 845
citizens were polled.
N = 1005 is sufficient for 38% supporting rate.
N = 1238 > 1067
Alpha = 5% (95% confi. Level)
Confidence interval : (38% - 3%, 38% + 3%) (3% margin of error)

民進黨台南市黨部今日公布由黨中央所做的市長民調,黃偉哲以 39.9% 領先高思博、林義豐的 14.7% 和


14.6% 。
這份民調調查日期為 9 月 13 至 14 日,完成數為 1259 名 20 歲以上具有投票權公民,在 95% 的信心水準下
,抽樣誤差約為正負 2.8% 。
The DPP’s Tainan City Party Department today announced the mayor’s poll conducted by the Party Central
Committee. Huang Weizhe led 14.7% and 14.6% of Gao Sibo and Lin Yifeng with 39.9%.
Theorem 3.2.1 If X is a binomial random variable
with parameters n and p, then
E(X) =  = np, Var(X) = 2 = np(1 – p),
Mx(t) = [pet + (1 – p)]n
Formula -1

P = 50%  minimal n
Formula-2

P = 50%  maximal n

d: margin of error
Formula-3
Source: https://www.google.com.tw/url?
sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwjS
6Iy20PfdAhUKOrwKHRvyCMsQjRx6BAgBEAU&url=http%3A%2F
%2Fwww.newworldencyclopedia.org%2Fentry
%2FDendrite&psig=AOvVaw1BZibMSRVu2svJadTOUX2Q&ust=15391149663295
90
5.5.3 SMALL SAMPLE CONFIDENCE INTERVALS FOR 

The problem: To find a confidence interval for the true mean  of a normal
population when the variance 2 is unknown and the sample size is small (< 30).
For n small (n<30) and 2 unknown, we can use the following theorem.
.

In practice, the first step in the previous procedure should include a test of normality.
Even when the data fail the normality test, most statistical software will produce a
confidence interval based on normality or give an error report.

We should understand that generally such answers are meaningless.

In those cases, nonparametric methods (Chapter 12) such as the Wilcoxon rank sum
method or bootstrap methods (Chapter 13) will be more appropriate.
5.6 A Confidence Interval for the Population Variance
Note: Normal
population!
Note: Normal
population!
Note: Normal
population!
5.7 Confidence Interval Concerning Two Population Parameters

X1_bar ~ N(mu1, sigma^2/n1)


Unknown equal variances
EXAMPLE 5.7.1
A study of two kinds of machine failures shows that 58 failures of the first kind took on the
average 79.7 minutes to repair with a standard deviation of 18.4 minutes, whereas 71
failures of the second kind took on average 87.3 minutes to repair with a standard deviation
of 19.5 minutes. Find a 99% confidence interval for the difference between the true average
amounts of time it takes to repair failures of the two kinds of machines.
5.8 Chapter Summary
5.9 Computer Examples (Optional)
Projects for Chapter 5

You might also like