You are on page 1of 33

Chapter 6

Estimation
Introduction

• One of the principal objectives of statistical investigation is to


make reasonable estimates.
• Usually, the true population parameter will be unknown, and
one objective of sampling could be to estimate its value.
• Estimation is a method that enables us to estimate, with
reasonable accuracy, the population parameter.
• In reality, to calculate the population parameter is extremely
difficult or an impossible goal, hence we need to make
estimates and sometimes make a statement about the error
that will likely accompany the estimate.
Cont’d

• The procedure of marking estimation is to have random


sample of size n from the know probability distribution,
compute sample statistics and use it as an estimate of the
population parameters.
• This procedure of estimation can be categorized in to two:
Point Estimation and Interval Estimation.
Some Terminologies of Estimation

• An estimator: is a sample statistics used to estimate a


population parameter.
• An estimator of a population parameter is a random variable
that depends on the sample information, and whose
realizations provide approximations to this unknown
parameter.
• Example: The sample mean, can be an estimator of the
population mean, μ.
Cont’d
• An estimate: is a specific observed value of a statistic or it is a
specific realization of that random variable. We can get an
estimate by taking a sample and computing the value taken by
our estimator in that sample.
• To clarify the distinction between the terms Estimator and
Estimate, suppose that we want to compute the mean age of
a student from a sample of this section and find it to be 19
years.
• If we use this specific value to estimate the age of students in
the campus, the value 19 years would be an estimate.
• Estimation: is the procedure of calculating the estimate using
an estimator.
Point Estimation

• Point Estimate is a single number, which is used to estimate


an unknown population parameter.
• Point estimates for population parameters are based on
certain criteria statisticians frequently use to choose among
estimators.
• The following are the standard estimators used by statisticians
to estimate population parameters.
Cont’d

• 𝑺𝒂𝒎𝒑𝒍𝒆 𝑴𝒆𝒂𝒏 (𝑿) : is the most common estimator of the


population mean. The sample mean (𝑋) is unbiased and
consistent.
• Moreover, it can be shown that if the population is normal the
sample mean is the most efficient unbiased estimator available.
• For these reasons the sample mean is generally the preferred
estimator of the population mean.
• It can be computed as:
σ 𝑋𝑖
• 𝑋=
𝑛
Cont’d

• 𝑺𝒂𝒎𝒑𝒍𝒆 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 𝑺𝟐 𝒂𝒏𝒅 𝑺𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝑫𝒆𝒗𝒊𝒂𝒕𝒊𝒐𝒏 (𝑺)


the sample variance is unbiased & consistent estimator of the
population variance.
• It is relatively efficient as compared to other estimators. Its
square root, the sample standard deviation, is generally used as
an estimator of the population standard deviation is also
relatively efficient.
• It can be computed as:
σ(𝑋𝑖 −𝑋)2
• 𝑆2 =
𝑛−1
Cont’d

• 𝑺𝒂𝒎𝒑𝒍𝒆 𝑷𝒓𝒐𝒑𝒐𝒓𝒕𝒊𝒐𝒏 (𝑷): is an unbiased, consistent, and


relatively efficient estimator of the population proportion.
• Due to this fact, it is generally the preferred estimator of the
population proportion.
• It is calculated by taking the ratio of elements in the sample
that have the same characteristics to the total number of
elements.
Cont’d

• Example: Price-earnings ratios for a random sample of ten


shares traded in New York stock exchange on December 27, 2017
were: 10; 16; 5; 10; 12; 8; 4; 6; 5; 4
• Find point estimates of the population
a) Mean
b) Variance
c) Standard deviation
d) Proportion of shares in the population for which the price-
earnings ratio exceeded 8.5.
• To find the first three of these sample quantities, we show the
calculations in tabular form as:
i 1 2 3 4 5 6 7 8 9 10 Total

Xi 10 16 5 10 12 8 4 6 5 4 80

Xi2 100 256 25 100 144 64 16 36 25 16 782

• The point estimate of the population mean is


σ 𝑋𝑖 80
• 𝑋= = =8
𝑛 10
• The point estimate of the population variance is
2
σ(𝑋𝑖 −𝑋)2 σ 𝑋𝑖2 −𝑛𝑋 782−640
• 𝛿2 = = = = 15.78
𝑛−1 𝑛−1 9
• The point estimate of the standard deviation is
• 𝛿 = 15.78=3.97
• The point estimate of the population proportion is
𝑋 4
• 𝑃𝑋 = = = 0.4
𝑛 10
Shortcomings of Point Estimate

• Estimation of population parameter by using a point sample


statistics has some problems; since we are estimating a large
population parameter by using a single number which is
either false or true.
• As a result the choice of point estimator has been based on
intuitive plausibility.
• Thus, a point estimate is much more useful if it is
accompanied by an estimate of the error that might be
involved.
Interval Estimation

• An Interval Estimate is a range of values used to estimate a


population parameter.
• It indicates the error in two ways: By the extent of its range
and the probability of the true population parameter lying
within that range.
• Instead of relying on the point estimate alone, we may
construct an interval around the point estimator, say within
two or three standard error of the mean on either side of the
point estimator such that this interval has, for instance 0.95
probability of including the true parameter value.
Cont’d

• Because a point estimator cannot be expected to provide the


exact value of the population parameter, an interval estimate is
often computed by adding and subtracting a value, called the
margin of error, to the point estimate.
• The general form of an interval estimate is as follows:
• Point estimate ± Margin of error
• The general form of an interval estimate of a population mean
is
• 𝑋ത ± Margin of error
Cont’d

• Assume that we want to find out how “close” is an estimator


𝑋 𝑡𝑜 𝜇.
• For this purpose, we try to find out two values such that the
probability that these two valuescontains the true μ is 1 − α.
• Where: 𝛼 is the level of significance,1 − 𝛼 is confidence
coefficient, and such an interval is known as confidence Interval.
• The variables consist the lower and the upper confidence limit
(or critical values) respectively.
a) Calculating Confidence Interval Estimate of a Population
Mean: Normal Population with known δ

• Suppose we have a normal population whose mean and


standard deviation are 𝜇 and 𝛿 then the sampling distribution of
𝛿𝑋
the mean will be normal with mean of 𝜇𝑋 = 𝜇 𝑎𝑛𝑑 𝛿𝑋 = .
𝑛
• For sampling distribution of the mean, the standard normal
𝑋−𝜇
variable is 𝑍 = , 𝑡ℎ𝑒𝑛 the interval estimate of a population
𝛿𝑋
mean (the confidence interval for a population mean) is:
• 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋
• Where 𝑍𝛼Τ2 is the value of standard normal variable that is
exceeded with a probability of 𝛼Τ2 or 𝑍𝛼Τ2 is Z value providing
an area of 𝛼Τ2 in the upper tail of the standard normal
probability distribution.
Cont’d
• If we want to establish a 95% confidence interval (1-ɑ) and a
significance level of 5% (ɑ =5), the 95% refers the middle 95% percent
of the observation and the 5%are equally divided between the two
tails as shown in the figure below.

• Values of Za/2 for the most commonly used confidence


Cont’d

• Example 1: A normal infinite population has a standard


deviation of 10. A random sample of size 25 has a mean of 50.
Construct a 95% confidence interval of the population mean?
• Solution:
• Given 𝛿 = 10, 𝑛 = 25, 𝑋ത = 50
• Required: 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋 with 95%
confidence interval
• 1st find the standard error of the mean i.e,
𝛿𝑋 10
• 𝛿𝑋 = = =2
𝑛 5
• Then we have that 𝛼= level of significance = 0.05 since 1 − 𝛼=
0.95, so 𝛼Τ2 = 0.025 𝑎𝑛𝑑 𝑍𝛼Τ2 = 𝑍0.025 = 𝑃 0 ≤ 𝑍 ≤ 𝑍0.025 =
1.96
Cont’d

• Therefore, the confidence interval estimation for the


population mean (𝜇) is:
• = 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋
• = 50 − 1.96 2 ≤ 𝜇 ≤ 50 + 1.96 2
• = 46.08 ≤ 𝜇 ≤ 53.92
• In this case we may say that “we are 95% confident that the
population mean lies within 46.08 and 53.92.” This statement
does not mean that the chance is 0.95 that the population mean
of all the random variables falls within the interval established
from this one sample.
Cont’d
• Instead, it means that if we select many random samples of
the same size and if we compute a confidence interval for each
of these samples, then in about 95 percent of these cases, the
population mean will lie within that interval.
• Example 2: The mean annual income of Ethiopian Airlines
Workers (EAL) workers is supposed to be 24,000 Birr. Assume
that this estimate was based on a sample of 250 airline
workers and the population standard deviation was 5000 Birr.
Then
• a) Compute the 95% confidence interval for the population
mean?
• b) Construct the 90% confidence interval for the population
mean?
Cont’d

• Solution: given 𝑋 = 24,000, 𝛿 = 5000, 𝑛 = 250


• Required: 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋
𝛿𝑋 5000
• 𝛿𝑋 = = = 316.23
𝑛 15.81
• 𝑖. 𝛼 = 0.05, 𝛼Τ2 = 0.025, 𝑍0.025
=
1.96; therefore, the 95% confidence interval can be given as:
• = 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋
• = 24000 − 1.96 316.23 ≤ 𝜇 ≤ 24000 + 1.96 316.23
• = 23,380.19 ≤ 𝜇 ≤ 24,619.81
Cont’d

• 𝑖𝑖. 𝛼 = 0.1, 𝛼Τ2 = 0.05, 𝑍0.05 =


1.64; therefore, the 95% confidence interval can be given as:
• = 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋
• = 24000 − 1.64 316.23 ≤ 𝜇 ≤ 24000 + 1.64 316.23
• = 23, 481 ≤ 𝜇 ≤ 24,518.62

• Note that:
▪ A narrower confidence interval is more precise
▪ Larger samples give more precise estimates
▪ Small variance leads to more precise estimates
▪ Lower confidence coefficients allow us to construct more
precise estimate
b) Calculating Confidence Interval Estimate of a Population
Mean: Normal Population with 𝛅 Unknown

• When the population standard deviation is unknown, we use


the sample standard deviation,𝑆𝑋 , as an estimate of 𝜹. The
sample standard deviation is given by:
σ(𝑋𝑖 −𝑋)2
• 𝑆𝑋 =
𝑛−1
• Thus, the standard deviation of the sampling distribution of
the sample means,
• 𝛿𝑋 is given by:
𝑆𝑋
• 𝛿𝑋 =
𝑛
• In this case, the construction of confidence interval estimate
depends up on whether the sample size is larger or small:
Cont’d
• Case 1: When the sample size is large and unknown 𝜹 (A
sample size is large when n>30)
• Confidence interval estimate for population mean (𝜇) is given
by:
𝑆𝑋
• 𝑋 − 𝑍𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑍𝛼Τ2 𝛿𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝛿𝑋 =
𝑛
• Case 2: When the sample size is small and unknown 𝜹 (A
sample size is small when n < =30)
• Confidence interval estimate for population mean (𝜇) is given
by:
𝑆𝑋
• 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋 , 𝑤ℎ𝑒𝑟𝑒 𝛿𝑋 =
𝑛
• Where 𝑡𝛼Τ2 is the t-value providing an areaof 𝛼Τ2 in the upper
tail of a t-distribution with 𝑛 − 1 degrees of freedom, and 𝑆 is
sample standard deviation.
Cont’d

• The t-distribution is a family of similar probability


distributions, with a specific t- distribution depending on a
parameter known as the degrees of freedom.
• As the number of degrees of freedom increases, the
difference between the t-distribution and standard normal
probability becomes smaller and smaller, and the t-distribution
will have less dispersion.
• The t-distribution is symmetrical, bell-shaped and has zero as
its mean. We use the
i. The t-distribution iff 𝛿 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑎𝑛𝑑 𝑛 ≤ 30
ii. The Z-distribution iff −𝛿 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑎𝑛𝑑 𝑛 > 30
• −𝛿 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛 𝑎𝑛𝑑 𝑛 𝑖𝑠 𝑤ℎ𝑎𝑡𝑒𝑣𝑒𝑟
Cont’d
• Example 1: Sales personnel for Beer factory are required to
submit weekly reports listing the customer contacts made during
the week. A sample of 28 weekly contact reports showed a mean
of 22.4 customer contacts per week for the sales personnel. The
sample standard deviation was five contacts.

i. Develop a 95% confidence interval for the sale personnel?


ii. Develop a 90% confidence interval for the sale personnel?

• Solution:
• Given 𝑋 = 22.4, 𝑆 = 5, 𝑛 = 28
• Required: 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
• 𝒊. 𝛼 = 0.05, 𝛼Τ2 = 0.025, 𝑡0.025,27 = 2.052;
Cont’d

𝑆𝑋 5
• 𝛿𝑋 = =
𝑛 28
• Therefore, the 95% confidence interval can be given as:
• = 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
5 5
• = 22.4 − 2.052 ≤ 𝜇 ≤ 22.4 + 2.052
28 28
• 𝒊𝒊. 𝛼 = 0.1, 𝛼Τ2 = 0.05, 𝑡0.05,27 =
1.703; therefore, the 90% confidence interval can be given as:
• = 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
5 5
• = 22.4 − 1.703 ≤ 𝜇 ≤ 22.4 + 1.703
28 28
Cont’d

• Example 2: In the testing of a new production method, 18


employees were selected randomly and asked to try the new
method. The sample mean production rate for the 18 employees
was 80 parts per hour and the sample standard deviation was 10
parts per hour. Provide 90% and 95% confidence intervals for the
population mean production rate for the new method, assuming
the population has a normal probability distribution.
• Solution:
• Given 𝑋 = 80, 𝑆 = 10, 𝑛 = 18
• Required: 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
• 𝒊. 𝛼 = 0.05, 𝛼Τ2 = 0.025, 𝑡0.025,17 = 2.11
Cont’d

𝑆𝑋 10
• 𝛿𝑋 = =
𝑛 18
• Therefore, the 95% confidence interval can be given as:
• = 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
10 10
• = 80 − 2.11 ≤ 𝜇 ≤ 80 + 2.11
18 18
• 𝒊𝒊. 𝛼 = 0.1, 𝛼Τ2 = 0.05, 𝑡0.05,17 =
1.74; therefore, the 90% confidence interval can be given as:
• = 𝑋 − 𝑡𝛼Τ2 𝛿𝑋 ≤ 𝜇 ≤ 𝑋 + 𝑡𝛼Τ2 𝛿𝑋
10 10
• = 80 − 1.74 ≤ 𝜇 ≤ 80 + 1.74
18 18
Properties of Estimators

• There are various methods with which we may obtain


estimates of the parameters of economic relationships.
• How are we to decide whether an estimate is ‘good’ or
whether it is better than another obtained from a different
method?
• We need some criteria for judging the goodness of the
estimate.
• The criteria or properties for a good estimator may be
different for small and large samples.
A) Small sample properties

1. Unbiasedness: An estimator is said to be unbiased if the


expected value of the estimator is equal to the true population
parameter, i.e. if 𝐸 𝑋 = 𝜇. Biased if 𝐸 𝑋 − 𝜇 ≠ 0 & Unbiased
if 𝐸 𝑋 − 𝜇 = 0, where 𝑋 is an estimator of 𝜇 and 𝜇 is a
population parameter.
2. Minimum variance (Best Estimator): An estimator is best when
it possesses the smallest variance as compared to any other
estimator which is obtained by various methods.
3. Efficiency, Efficient estimator: An estimator is efficient when it
occupies both the aforementioned properties (Unbiasedness,
Minimum variance)
Cont’d

4. Minimum Mean Square Estimator (MSE): An estimator is a


MSE if it has the smallest mean square error defined as the
expected value of the square differences around the true
population parameter.
5. Sufficiency, Sufficient Estimator: An estimator is said to be
sufficient if it utilizes all the information a sample contains about
the true parameter i.e. it must use all the observations of a
sample
6. Best, Linear and Unbiased Estimator (BLUE): An estimator is
BLUE if it is best, linear and unbiased. An estimator is linear if it is
a linear function of the sample observation.
B)Large Sample Properties (Asymptotic
Properties)
1. Asymptotic Unbiased: An estimator is asymptotically unbiased if
lim 𝐸 𝑋 = 𝜇.
𝑛→∞
2. Consistency: An estimator is consistence if it satisfies the following
conditions:
i. The estimator must be asymmetrically unbiased
ii. The variance of the estimator must approach to zero as n
approaches infinity, i.e., lim 𝑉𝑎𝑟 𝑋 → 0
𝑛→∞
3. Asymptotic Efficiency: An estimator is asymptotically efficient if:
i. The estimator is consistent
ii. The estimator has a smaller asymptotic variance as compared to
any consistent estimator.

You might also like