You are on page 1of 17

ESTIMATION

ESTIMATION
It is a statistical procedure to determine the approximate value of a population
parameter on the basis of a sample statistic.

• Estimator: A statistic used to estimate the corresponding


parameter.
(E.g.: The sample mean is an estimator of the
population mean.)
It is a formula which specifies how to use the sample
observations to estimate the parameter.
• Estimate: A given value of the estimator computed from a
sample.
(E.g.: If the mean value of a sample of randomly
selected invoices is $262, this value is an estimate of
the corresponding population mean.)

Estimating a population parameter from information about the relevant sample


statistic we have two options.

2
1) Point estimation: Estimation of a parameter using a single
estimate computed from the sample.
(E.g.: The population mean is about $262.)

In general, x

and we claim that μ is somewhere there, but we cannot specify


with reasonable certainty how wide this interval is, or how close
x-bar and μ are to each other.

2) Interval estimation: Estimation of a parameter using an interval


(rather than a single point) that is likely to
include the parameter.
(E.g.: With 95% confidence the population
mean is between $257 and $267.)

In general, x and we claim with a


certain level of
confidence that μ is
xB xB within this interval.
This interval, from x  B to x  B is called confidence interval.

Note: Compared to point estimation, the distinct advantage of interval


estimation is that it attaches a level of confidence to the estimate, as well
as a measure of precision/accuracy.

The wider the confidence interval estimate,


the less precise the estimation is.
Therefore, precision is inversely related to
the width of the confidence interval.

The confidence interval is centered around the point estimate, and half of
the width of the confidence interval is called the error bound, B.
It is the maximum error of estimation at the given level of confidence.
But how do we find B?

4
CONFIDENCE INTERVAL ESTIMATION OF THE
POPULATION MEAN, µ
Ex 1:
The quality control manager of a light bulb factory needs to estimate the
average life of a large production run of light bulbs. The process standard
deviation is known to be 100 hours. A random sample of 50 bulbs indicated a
sample average life of 350 hours. What can be said with 95% confidence about
the population (or true) average life in that particular run?

Since the sample size is large enough, we can rely on CLT.


The population mean is unknown, but the population standard deviation is
given, and the standard error of the sample mean is
100
x   14.1 X : N ( μ , 14.1)
50
X 
Therefore, Z  is a standard normal random variable.
14.1

5
Consider the standard normal curve:

0.475 + 0.475 = 0.95


Read the standard normal table
‘backwards’. P (0 < Z < 1.96) = 0.475,
so P (-1.96 < Z < 1.96) = 0.95.
Z
-1.96 0 1.96

X  P(1.96  14.1  X    1.96  14.1)  0.95


P(1.96   1.96)  0.95
14.1
P( X  1.96  14.1    X  1.96  14.1)  0.95
The substitution of the given sample mean value for X-bar yields the lower and
upper limits of the 95% confidence interval:

350  1.96 14.1  322.4 and 350  1.96 14.1  377.6


With 95% confidence we can say that the average life of the production run lies
between 322.4 and 377.6 hours.
In general, the C% confidence interval estimator of µ (at a
given σ) is:

( x  z / 2 x ; x  z / 2 x ) where

 (multiplied by the square root


x  of the finite population
Lower Upper n correction factor, if necessary.)
confidence limit confidence limit
(LCL) (UCL) C Confidence level.
  1
100
C
P( z / 2  Z  z / 2 ) 
100
The 100(1-α/2) percentile
from the standard normal
table.

7
In the previous example the population standard deviation was given.
However, if we don’t know the population mean, it is unlikely that we know the
population standard deviation, since σ is the ‘average’ deviation from µ.

We can overcome this problem by


1) estimating σ with the sample standard deviation, s ;
2) replacing the true standard error of X-bar with its estimate;
s
sx 
n
3) and obtaining the 100(α/2) and 100(1-α/2) percentiles from the
Student’s t distribution, instead of the standard normal distribution
(granted, that the population is not extremely non-normal)
because
X  X 
is standard normal, but is not.
x sx
It is centered around zero, just like the the standard normal distribution,
but it is more widely dispersed.

8
The extent to which the t distribution spreads out is determined by the
so called degrees of freedom (df), which is the number of independent
terms. In this application df = n -1.

Z t with df= 10

t with df= 7

The larger the degrees of freedom, the smaller the dispersion of the t
distribution and the more similar it is to the standard normal distribution.
If df > 200, the t distribution can be approximated with the standard
normal distribution.

Note: Some textbooks suggest that df > 30 is large enough to make the
approximation t  z.
(See Selv. pp.258-259 about the application of the t table.
Abridged version: pp.256-257)

9
When σ is unknown, the C% confidence interval estimator
of µ is:

( x  t / 2,n 1s x ; x  t / 2,n 1s x ) where s  s (multiplied by the square root


of the finite population
x
n correction factor, if necessary)

LCL
t / 2 , n 1 The 100(1-α/2) percentile
UCL from the t table, with n -1
degrees of freedom.
Ex 2: A manufacturer of a brand of designer jeans realises that many
retailers charge less than the suggested retail price of $40. A random sample
of 20 retailers reveals that the mean and standard deviation of the prices of n
the jeans are $32 and $2.50. Estimate with 90% confidence the mean retail
price of the jeans, assuming that the distribution of price is normal.
x-bar s C 2.5
sx   0.559
  0.10  /2  0.05 df  n  1  19 t0.05,19  1.729 and 20

x  t / 2,n 1s x  32  1.729  0.559  (31.03 , 32.97)


With 90% confidence the average retail price of the jeans is between $31.03
and $32.97, well below the recommended retail price.
When you want to develop a confidence interval for µ, always answer
the following questions:

1) Is X-bar normally distributed


(or at least approximately normal ie. sample size ≥ 30)?
If NOT, you cannot develop the confidence interval.
If YES,

2) Is σ known?

If YES, you can calculate the true standard error of X-bar,


and use zα/2.

If NOT, estimate the standard error of X-bar from s,


and use tα/2.

11
CONFIDENCE INTERVAL ESTIMATION OF THE
POPULATION PROPORTION, P
• Given that the sampling distribution of p-hat is approximately normal
(when np, nq 5) we can develop an interval estimator for p, similarly to
the interval estimator of µ with unknown σ.

The C% confidence interval estimator of p is


( pˆ  z / 2 s pˆ ; pˆ  z / 2 s pˆ ) or pˆ  z / 2 s pˆ in brief,

pˆ qˆ
where s pˆ  is the estimated
n standard error of p-hat.
LCL UCL

Ex 3: n p-hat = 0.517
A random sample of 2,000 persons on the electoral roll indicates that 51.7%
agree with a proposition concerning the performance of the Prime Minister.
Estimate with 99% confidence the population proportion of voters who agree.
C
12
Since p is unknown, strictly speaking, we cannot check whether np, nq  5.
However, np-hat = 2000  0.517=1034 and nq-hat = 966, so we can expect
np, nq be large enough.
p-hat is likely to be normally distributed, approximately.

pˆ qˆ 0.517  0.483
s pˆ    0.011
n 2000
C
  1  0.01 z / 2  z 0.005  2.576 (You can use the Z table, or
100 the last row of the t table.)

Therefore

pˆ  z / 2 s pˆ  0.517  2.576  0.011  (0.489 , 0.545)

With 99% confidence the population proportion of voters who agree is


between 48.9% and 54.5%.
At the 99% level of confidence we cannot tell whether the majority
(more than 50%) of voters agree, as the entire confidence interval is
not greater than 50%.

13
Ex 4:
A health department official is investigating the possibility of allowing doctors
to advertise their services. In a survey designed to examine this issue, 91 n
doctors were asked whether they believed that doctors should be allowed to
advertise. A total of 23 respondents supported advertising by doctors.
Estimate with 90% confidence the proportion of all doctors who support
advertising. f
C

pˆ 
23
 0.253 pˆ qˆ 0.253  0.747
s pˆ    0.0456
91 n 91
C
  1  0.10 z / 2  z 0.05  1.645
100
Thus

pˆ  z / 2 s pˆ  0.253  1.645  0.0456  (0.178 , 0.328)

With 90% confidence the population proportion of doctors who support


advertising is between 17.8% and 32.8%.
14
THE RELATIONSHIP BETWEEN C, B AND n

Level of confidence Sample size


Error bound
Ex 5:
A random sample of 400 families were asked for their weekly income. The
population standard deviation is $90. Estimate using Excel, the population
weekly family income with 95% confidence.

z-Estimate: Mean 1001.17    1018.81

Income Width = 17.64


Mean 1009.99
Standard Deviation 92.92
Observations 400.00 With 95% confidence the
SIGMA 90.00 mean population weekly
LCL 1001.17 family income is between
$1001.17 and $1018.81.
UCL 1018.81

15
Estimate using Excel, the Estimate using Excel, the
population weekly family population weekly family
income with 90% confidence. income with 99% confidence.

z-Estimate: Mean z-Estimate: Mean

Income Income
Mean 1009.99 Mean 1009.99
Standard Deviation 92.92 Standard Deviation 92.92
Observations 400.00 Observations 400.00
SIGMA 90.00 SIGMA 90.00
LCL 1002.59 LCL 998.40
UCL 1017.39 UCL 1021.58

1002.59    1017.39 998.40    1021.58

Width = 14.80 Width = 23.18

At a given sample size, the higher the level of confidence, the wider the
confidence interval is and thus the interval estimate is less precise.

16
Is it possible to maintain a high level of confidence while increasing the
precision?
Recall that precision is inversely related to the width of the confidence interval,
and the error bound B is half the width of the confidence interval.
For example, the confidence interval of µ is ( x  B , x  B)

Assuming that X-bar is normally distributed, σ is known and n/N<0.10, this


interval is
( x  z / 2 x , x  z / 2 x ) with 
 
x
n
The Excel output for descriptive 
B  z / 2
statistics calculates the value of B. n
This expression shows that
increased precision, which
means a lower value for B, can
be achieved while maintaining a
high level of confidence Z, by
increasing the sample size n.

17

You might also like