You are on page 1of 76

Estimation

Book: Statistics for Business and Economics (Chapter 8)


Author: Anderson, Sweeney, et. al.
Edition: 13th Edition

Faculty: Suvechcha Sengupta

1
Statistical Inference
Statistical Inference

 Statistical inference is the process by which we acquire information and draw


conclusions about populations from samples

 There are two types of inference:


• Estimation
• Hypothesis testing

 In order to do inference, we require the skills and knowledge of descriptive statistics,


probability distributions, and sampling distributions.

K J Somaiya Institute of Management, India 3


Where we have been…

1. Normal distributions allow us to make probability statements about X (a


member of the population). To do so we need the population parameters.
Normal: µ and σ
2. Sampling distributions allow us to make probability statements about
statistics. We need the population parameters.
Sample mean: µ and σ

K J Somaiya Institute of Management, India


Where we are going…

However, in almost all realistic situations parameters are unknown.

We will use the sampling distribution to draw inferences about the unknown
population parameters.

K J Somaiya Institute of Management, India


Key Statistical Concepts

• Population –
A population is the group of all items of interest to a statistics practitioner. It is
frequently very large; sometimes infinite.
E.g. All the voters in the country

• Sample –
A sample is a set of data drawn from the population. Potentially very large, but less
than the population.
E.g. All the voters from Maharashtra

K J Somaiya Institute of Management, India 6


Key Statistical Concepts

Population
Sample

Subset

Statistic
Parameter
Populations have Parameters Samples have Statistics

K J Somaiya Institute of Management, India 7


Key Statistical Concepts

• Parameter -
A descriptive measure of a population. A parameter is a statistical constant that
describes a feature about the population
Eg.
• Expected value μ (also called “the population mean”)
• Standard deviation σ (also called the “population standard deviation”)
• Statistic -
A descriptive measure of a sample. A statistic is any statistical value computed
to describe a sample
Eg.
• Sample mean (“x bar”)
• Sample Standard deviation s

K J Somaiya Institute of Management, India 8


Statistical Inference

Statistical inference is the process of making an estimate, prediction, or decision about a population
based on a sample.

Population

Sample

Inference

Statistic
Parameter

What can we infer about a Population’s Parameters based on a Sample’s Statistics?

K J Somaiya Institute of Management, India 9


Statistical Inference

We use statistics to make inferences about parameters.

Therefore, we can make an estimate, prediction, or decision about a population based


on sample data.

Thus, we can apply what we know about a sample to the larger population from which it
was drawn!

K J Somaiya Institute of Management, India 10


Statistical Inference

Rationale:
Large populations make investigating each member impractical and
expensive.
Easier and cheaper to take a sample and make estimates about the
population from the sample.

However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of
reliability”, namely confidence level and significance level.

K J Somaiya Institute of Management, India 11


Estimation
Objective

The objective of estimation is to determine the approximate value of a


population parameter on the basis of a sample statistic.

E.g., the sample mean ( ) is employed to estimate the population mean


( µ ).

K J Somaiya Institute of Management, India 13


Estimator

An estimator is a sample statistic that is used to infer the value of an


unknown parameter. Thus the estimator, the parameter and its result
(estimate) are different from each other

K J Somaiya Institute of Management, India 14


Qualities of Estimators

Qualities desirable in estimators include unbiasedness, consistency, and relative


efficiency:

An unbiased estimator of a population parameter is an estimator whose


expected value is equal to that parameter

An unbiased estimator is said to be consistent if the difference between the


estimator and the parameter grows smaller as the sample size grows larger

If there are two unbiased estimators of a parameter, the one whose variance
is smaller is said to be relatively efficient

K J Somaiya Institute of Management, India 15


Unbiased Estimators…

An unbiased estimator of a population parameter is an estimator whose expected value


is equal to that parameter.

E.g. the sample mean is an unbiased estimator of the population mean µ , since:

E() = µ

K J Somaiya Institute of Management, India


Unbiased Estimators…

An unbiased estimator of a population parameter is an estimator whose expected value


is equal to that parameter.

E.g. the sample median is an unbiased estimator of the population mean µ since:

E(Sample median) = µ

K J Somaiya Institute of Management, India


Consistency…

An unbiased estimator is said to be consistent if the difference between the estimator


and the parameter grows smaller as the sample size grows larger.

E.g. is a consistent estimator of µ because:

V() is σ2/n
That is, as n grows larger, the variance of grows smaller.

K J Somaiya Institute of Management, India


Consistency…

An unbiased estimator is said to be consistent if the difference between the estimator


and the parameter grows smaller as the sample size grows larger.

E.g. Sample median is a consistent estimator of µ because:

V(Sample median) is 1.57σ2/n

That is, as n grows larger, the variance of the sample median grows smaller.

K J Somaiya Institute of Management, India


Relative Efficiency…

If there are two unbiased estimators of a parameter, the one whose variance is smaller is
said to be relatively efficient.

E.g. both the sample median and sample mean are unbiased estimators of the population
mean, however, the sample median has a greater variance than the sample mean, so we
choose since it is relatively efficient when compared to the sample median.

Thus, the sample mean is the “best” estimator of a population mean µ.

K J Somaiya Institute of Management, India


Types of Estimators

There are two types of estimators:


• Point estimators
• Interval estimators
Goal of estimation: How can we use sample data to estimate values of population
parameters?

• Point estimate: A single statistic value that is the “best guess” for the parameter
value
• Interval estimate: An interval of numbers around the point estimate, that has a fixed
“confidence level” of containing the parameter value. Called a confidence interval

K J Somaiya Institute of Management, India 21


Point Estimator

A point estimator draws inferences about a population by


estimating the value of an unknown parameter using a single value
or point

E.g.
Sample mean, (“x bar”) is the point estimator of μ
Sample Standard deviation, s, is the point estimator of σ

K J Somaiya Institute of Management, India 22


Interval Estimation

An interval estimator draws inferences about a population by


estimating the value of an unknown parameter using an interval.

K J Somaiya Institute of Management, India 23


Point and Interval Estimation for Population
Mean

For example, suppose we want to estimate the mean summer income of a class of
business students. For n = 25 students,
is calculated to be 400 $/week.

point estimate interval estimate

An alternative statement is:


The mean income is between 380 and 420 $/week.

K J Somaiya Institute of Management, India 24


Interval Estimation
Margin of Error and the Interval Estimate

 A point estimator cannot be expected to provide the exact value of the population parameter.
 An interval estimate can be computed by adding and subtracting a margin of error to the point estimate.

Point Estimate +/- Margin of Error


 The general form of an interval estimate of a population mean is

𝑥 ̅ + Margin of Error


 The purpose of an interval estimate is to provide information about how close the point estimate is to
the value of the parameter.
 That is we say (with some ___% certainty) that the population parameter that we wish to study will lie
between some lower and upper bounds.

K J Somaiya Institute of Management, India 26


Confidence Interval

Confidence Interval of a parameter consists of an interval (a Lower limit and


an upper limit) that is believed to contain the unknown parameter with a certain
level of confidence

Level of confidence (1-α) (also called confidence level) is a measure of how


frequently the interval will actually include the parameter and is stated in terms
of percentage. In other words, it is the proportion of times that an estimating
procedure will be correct.
E.g. 95% confidence level, 99% confidence level.
A confidence level of 95% means that, estimates based on this form of
statistical inference will be correct 95% of the time.

Level of significance (α) is the risk or the chance we take that the true
population parameter may not be contained in the confidence interval. It
measures how frequently the conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the long run, this type of conclusion
will be wrong 5% of the time.

K J Somaiya Institute of Management, India 27


Confidence & Significance Levels

If we use α (Greek letter “alpha”) to represent significance, then our confidence


level is 1 - α.

This relationship can also be stated as:


Confidence Level
+ Significance Level
=1

K J Somaiya Institute of Management, India


28
Confidence Interval

 In order to develop an interval estimate of a population mean, the margin of


error must be computed using either:
 the population standard deviation σ , or
 the sample standard deviation s

Population Standard Population Standard


Deviation is known Deviation is unknown (Use
(Use σ) s)

Large sample size  Small sample size 


Z dist
Z dist t dist

K J Somaiya Institute of Management, India 29


FORMULA

The general formula for all confidence interval is :

𝜎 𝑠 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑆𝐷𝑜𝑟 𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝐷
Sample Mean 𝑜𝑟 =
√𝑛 √ 𝑛 𝑆𝑞𝑢𝑎𝑟𝑒𝑟𝑜𝑜𝑡𝑜𝑓 𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒

Point Estimate ± (Critical Value) (Standard Error)

Z / t- score (depends on
confidence level)

K J Somaiya Institute of Management, India 30


Estimating the Population Mean
When The Population Standard
Deviation is Known
(Z Score)
Estimating µ when σ is known…

we know that the sampling distribution of is approximately normal with


mean µ and standard deviation . Thus

Previously, we produced the following general probability statement about

K J Somaiya Institute of Management, India


Estimating µ when σ is known…

 Now we substitute Z to produce

 With some algebra, we obtain

 Which is still a probability statement about


i.e. there is a 1 -  probability that the value of a sample mean will provide a
margin of error of or less.

K J Somaiya Institute of Management, India


Graphically Representing CI for
Population Mean

There is a 1 -  probability that the value of a sample


mean will provide a margin of error of or less.

1 -  of all
values
/2 /2


𝑥
𝑧𝛼 / 2 𝜎 𝑥 𝑧𝛼 / 2 𝜎 𝑥

K J Somaiya Institute of Management, India 34


Graphically Representing CI for
Population Mean

Let CI (1 – α)*100% = 95% The critical values of


Then α = 0.05 and each tail “z” that define the two
has 0.025 area. areas of 0.025 are -1.96
and +1.96

α /2 = 0.025 α /2 = 0.025
95 % Confidence
Interval

-1.96 1.96

“ α “ is the proportion in the tails (shaded in green) of the distribution that is outside the established
confidence interval

K J Somaiya Institute of Management, India 35


Estimating Population Mean using CI

Thus, the confidence interval for Population Mean is given by

Lower confidence limit (LCL) =

Upper confidence limit (UCL) =

Note: We assume that population SD given by σ is known

K J Somaiya Institute of Management, India 36


Construction of CI for Population Mean

Step 1: Select our desired level of confidence


Let’s suppose we want to construct an interval using the 95% confidence level

Step 2: Calculate α and α/2


(1-α)*100% = 95%
α = 0.05
α/2 = 0.025

Step 3: Look up the corresponding z-value


For α/2 = 0.025, z – score = 1.96

K J Somaiya Institute of Management, India 37


Construction of CI for Population
Mean

Step 4: Multiply the z-score by the standard error

Step 5: Find the interval by adding and subtracting this product from the sample
mean

K J Somaiya Institute of Management, India 38


Common Confidence Levels and α Values

Here is a table of commonly used confidence levels, α, α/2 values, and corresponding z-
scores which we use in our examples
Zα/2 is calculated using the formula Normsinv(1-α/2) or by looking into the tables

(1-)*100% Table lookup area

90% 0.1 0.05 0.9500 1.645


95% 0.05 0.025 0.9750 1.96
98% 0.02 0.01 0.9900 2.33
99% 0.01 0.005 0.9950 2.58
39

K J Somaiya Institute of Management, India


Example 1

It is known that the amount of time Times


needed to change the oil on a car is
11
normally distributed with a standard
deviation of 5 minutes. The amount of 10
time to complete a random sample of 16
10 oil changes was recorded and listed
here. Compute the 99% confidence
15
interval estimate of the mean of the 18
population. 12
25
20
18
24

K J Somaiya Institute of Management, India 40


Solution

Data given Value


(1-α)*100% 99%
α 0.01
α/2 0.005
z-score 2.58
5
n 10

K J Somaiya Institute of Management, India 41


Solution

• Calculate Std. Error


• Multiply z-score to the
std. error
• Calculate the sample
Mean

42

K J Somaiya Institute of Management, India


Solution

• Generate the Lower


Limit by subtracting
from Sample mean

• Generate the Upper


limit by adding to the
Sample mean

43

K J Somaiya Institute of Management, India


Solution

 Therefore,
Lower Confidence Limit = 12.8207
Upper Confidence Limit = 20.9793
Answer: The Population mean is estimated to lie within (12.8207, 20.9793) with 99%
confidence.

K J Somaiya Institute of Management, India 44


Meaning of confidence

Because 99% of all the intervals constructed using

will contain the population mean, we say that we are 99% confident that the
interval includes the population mean m.
That is 99% of the times, this estimator will contain the population mean while
1% of the time it may not contain the population mean

K J Somaiya Institute of Management, India 45


Graphically Understanding the meaning of
confidence level

1-

/2 /2


𝑥

interval
𝑧𝛼 / 2 𝜎 𝑥 𝑧𝛼 / 2 𝜎 𝑥
does not
interval
include m
𝑥 includes m

K J Somaiya Institute of Management, India 46


Example Lloyd’s

Each week Lloyd’s Department Store selects a simple random sample of 100 customers
in order to learn about the amount spent per shopping trip. Lloyd’s has been using the
weekly survey for several years. Based on the historical data, Lloyd’s now assumes a
known value of σ = $20 for the population standard deviation. The historical data also
indicate that the population follows a normal distribution.
 Find a point estimate for the mean amount spent per shopping trip for the population
of all Lloyd’s customers.
 Compute the margin of error for this estimate and develop an interval estimate of the
population mean.

K J Somaiya Institute of Management, India


Estimating the Population Mean When
The Population Standard Deviation is
unknown

Population Standard
Deviation is unknown

Large sample size  Small sample size 


Z dist t dist
Estimating the Population Mean When
The Population Standard Deviation is
Unknown
(LARGE SAMPLE SIZE, n>30)
Estimating Population Mean (µ) when Population
Standard Deviation (σ) is unknown
(LARGE SAMPLE SIZE)

In a real-life situation, there can be various cases where a researcher does not have a fair idea about the
population standard deviation
For large sample sizes (n > 30), the sample standard deviation (s) can be a good estimate of the
population standard deviation (σ)

Hence, confidence interval for estimating population mean µ, when σ is unknown and sample size is
large (n ≥ 30) is

to

Where is the sample mean, n the sample size, s the sample standard deviation

K J Somaiya Institute of Management, India 50


Example 2

In order to estimate the customer loyalty for a particular product, a researcher poses the following
question to a sample of 100 customers: How many years have you been continuously using this
product? This sample yielded a mean period of 8 years with a sample standard deviation of 2
years. Construct a 95% confidence interval for estimating the population mean
Since σ is unknown and sample size is large (n ≥ 30), therefore confidence interval is

to

i.e. 7.608 to 8.392


The researcher is 95% confident that the population mean will lie between 7.608 years and 8.392
years

K J Somaiya Institute of Management, India 51


Estimating the Population Mean When
The Population Standard Deviation is
Unknown
(SMALL SAMPLE SIZE, n ≤ 30)
Estimating Population Mean (µ) when Population
Standard Deviation (σ) is unknown
(SMALL SAMPLE SIZE)

In case of small sample size (n < 30), the problem can be solved by using the t
statistic, developed by a British statistician, William S. Gosset

The t distribution is a family of similar probability distributions with a specific t


distribution depending on the parameter known as the degrees of freedom

For each different degrees of freedom, the t distribution is unique

53

K J Somaiya Institute of Management, India


Degrees of Freedom

 The concept of degrees of freedom is central to the principle of estimating statistics of populations
from samples of them
 The terms “degrees of freedom” describes the number of values in the final calculation of a statistic
that are free to vary
 For. Eg., Imagine a set of three numbers {1, 6, 5}. Calculating the mean for those numbers is easy: (1
+ 6 + 5) / 3 = 4
 Now, imagine a set of three numbers, whose mean is 3. There are lots of sets of three numbers with a
mean of 3, but for any set the bottom line is this: you can freely pick the first two numbers, any
number at all, but the third (last) number is out of your hands as soon as you picked the first two. Say
our first two numbers are the same as in the previous set, 1 and 6, giving us a set of two freely picked
numbers, and one number that we still need to choose, x: {1, 6, x}.  For this set to have a mean of 3,
we don’t have anything to choose about x. X has to be 2, because (1 + 6 + 2) / 3 is the only way to get
to 3. So, the first two values were free for you to choose, the last value is set accordingly to get to a
given mean. This set is said to have two degrees of freedom, corresponding with the number of values
that you were free to choose (that is, that were allowed to vary freely)

K J Somaiya Institute of Management, India 54


Degrees of Freedom

The general rule then for any set is that if n equals the number of
values in the set, the degrees of freedom equals n – 1

K J Somaiya Institute of Management, India 55


Characteristics of the t-distribution

1. It is, like the z distribution, a continuous distribution

2. It is, like the z distribution, bell-shaped, unimodal and symmetrical

3. There is not one t distribution, but rather a family of t distributions. All t distributions
have a mean of 0, but their standard deviations differ according to the sample size, n

4. The t distribution is more spread out and flatter at the center than the standard normal
distribution. As the sample size increases, however, the t distribution approaches the
standard normal distribution. Also the t distribution has more area in the tails as
compared to standard normal distribution

K J Somaiya Institute of Management, India 56


t-distribution

t distribution formula is given as

This formula is same as the z - formula, but the distribution table values are different

K J Somaiya Institute of Management, India 57


t-distribution

Note: t Z as n increases

Standard Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-shaped and
symmetric, but have ‘fatter’ tails than the
normal t (df = 5)

0 t

K J Somaiya Institute of Management, India 58


Example 3

The personnel department of an organization wants to apply cost-cutting measures for


improving efficiency. As the first step, the personnel department wants to curtail
telephone expenses incurred by employees. For this, personnel department has taken a
random sample of 10 employees and gathered the following data about telephone
expenses (in thousand rupees) in the previous year:

10 12 24 23 11 14 15 34 16 23

Construct a 95% confidence interval to estimate the average telephone expenses of the
employees in the population

K J Somaiya Institute of Management, India 59


Solution using Excel

• Since σ is unknown and sample size is small (n < 30), therefore confidence
interval is calculated using t – distribution given by

tα/2,n-1 is calculated using Excel function = T.INV(1-α/2,n-1)


= T.INV(1-0.025,9) = 2.262

K J Somaiya Institute of Management, India 60


Solution using Excel

• To calculate Standard Deviation of


Sample use STDDEV.S(sample values)

• s = 7.598

• To calculate sample mean use the


AVERAGE(sample values) function

61
Solution using Excel

• n = 10,
• s = 7.598
• t0.025,9 = 2.262
Hence, the confidence
interval is 12.765 to
23.635
So, the personnel
department is 95%
confident that the
population mean will lie
between Rs. 12,765 and
Rs. 23,635

62

K J Somaiya Institute of Management, India


Example 4

From a population, a random sample of size 20 is taken. This sample has a sample mean as 80 and sample
standard deviation as 10. construct a 99% confidence interval for population mean
Since σ is unknown and sample size is small (n < 30), therefore confidence interval is

to

tα/2,n-1 is calculated using Excel function =T.INV(1-α/2,n-1)

= T.INV(1-0.005,19) = 2.861

= 80, n = 19, s = 10, t0.005,19 = 2.861

Hence, the confidence interval is 73.603 to 86.397 Hence, we are 99% confident that the population mean
will lie between 73.603 and 86.397

K J Somaiya Institute of Management, India 63


Exercise 22

Marvel Studio’s motion picture Guardians of the Galaxy opened over the first
two days of the 2014 Labor Day weekend to a record-breaking $94.3 million
in ticket sales revenue in North America (The Hollywood Reporter, August 3,
2014). The ticket sales revenue in dollars for a sample of 30 theaters is as
follows.

a. What is the 95% confidence interval estimate for the mean ticket sales
revenue per theater? Interpret this result.

b. Using the movie ticket price of $8.11 per ticket, what is the estimate of the
mean number of customers per theater?

c. The movie was shown in 4080 theaters. Estimate the total number of
customers who saw Guardians of the Galaxy and the total box office ticket
sales for the weekend.

K J Somaiya Institute of Management, India 64


Interval Width…

A wide interval provides little information.


For example, suppose we estimate with 95% confidence that an accountant’s average
starting salary is between $15,000 and $100,000.

Contrast this with: a 95% confidence interval estimate of starting salaries between
$42,000 and $45,000.

The second estimate is much narrower, providing accounting students more precise
information about starting salaries.

K J Somaiya Institute of Management, India


Interval Width…

The width of the confidence interval estimate is a function of the


confidence level, the population standard deviation, and the
sample size…

K J Somaiya Institute of Management, India


Interval Width…

The width of the confidence interval estimate is a function of the confidence


level, the population standard deviation, and the sample size…

A larger confidence level produces a wider confidence interval:

K J Somaiya Institute of Management, India


Interval Width…

The width of the confidence interval estimate is a function of the confidence


level, the population standard deviation, and the sample size…

Larger values of σ produce wider confidence intervals

K J Somaiya Institute of Management, India


Interval Width…

The width of the confidence interval estimate is a function of the confidence


level, the population standard deviation, and the sample size…

Increasing the sample size decreases the width of the confidence interval while
the confidence level can remain unchanged.
Note: this also increases the cost of obtaining additional data

K J Somaiya Institute of Management, India


Sample Size for an Interval Estimation of a
Population Mean

• Let E = the desired margin of error.


• E is the amount added to and subtracted from the point estimate to obtain
an interval estimate.
• If a desired margin of error is selected prior to sampling, the sample size
necessary to satisfy the margin of error can be determined.

K J Somaiya Institute of Management, India 70


Sample Size for an Interval Estimation of a Population
Mean

• Margin of Error
𝜎
𝐸=𝑧 𝛼 / 2
√𝑛

• Necessary Sample Size


n=

71
Selecting the Sample Size…

• The Necessary Sample Size equation requires a value for the population
standard deviation s .
• If s is unknown, a preliminary or planning value for s can be used in the
equation.
1. Use the estimate of the population standard deviation computed in a
previous study.
2. Use a pilot study to select a preliminary study and use the sample
standard deviation from the study.
3. Use judgment or a “best guess” for the value of s .

72
Selecting the Sample Size…

A previous study that investigated the cost of renting automobiles in the


United States found a mean cost of approximately $55 per day for renting a
midsize automobile. Suppose that the organization that conducted this study
would like to conduct a new study in order to estimate the population mean
daily rental cost for a midsize automobile in the United States. In designing the
new study, the project director specifies that the population mean daily rental
cost be estimated with a margin of error of $2 and a 95% level of confidence.
Using 9.65 as the planning value for σ obtain the sample size.

K J Somaiya Institute of Management, India


Selecting the Sample Size…

 Thus, the sample size for the new study needs to be at least 89.43 midsize
automobile rentals in order to satisfy the project director’s $2 margin-of-error
requirement.
 In cases where the computed n is not an integer, we round up to the next
integer value; hence, the recommended sample size is 90 midsize automobile
rentals.

K J Somaiya Institute of Management, India 74


Summary of Estimation

Estimating µ when σ is Estimating µ when σ is Estimating µ when σ is


Known Unknown and n>30 Unknown and n<30

LCL: LCL: LCL:

UCL: UCL: UCL:

where Zα/2 is where Zα/2 is where tα/2,n-1 is =T.INV(1-α/2,n-


=NORMSINV(1-α/2) =NORMSINV(1-α/2) 1)

K J Somaiya Institute of Management, India 75


Thank You
simsr.somaiya.edu

K J Somaiya Institute of Management, India 76

You might also like