You are on page 1of 58

Business Statistics

PAN Baoqian, Kris


ISOM 2500

Topic 3: Estimation
Part a

What we have done


Descriptive
Statistics

Probability

Random
variables

Discrete
Random
Variable
Continuous
Random
Variable
Sampling
distribution

Simple
Linear
regression

Hypothesis
testing

Estimation
Confidence
intervals
2

Goals for this topic

Know what is Point estimate, Sampling


Distribution and Central Limit Theorem
Estimating the sampling distribution of the
sample mean

Case: Double E (EE)

EE is one of the largest retailers of consumer


electronics in USA. However, of late, EEs profits
have been declining. The primary reasons for
this are suspected to be falling quality of service
and growing competition.
Managers suspect that a number of customers
use EE to learn about a product but do not buy
from EE (called pseudo customers). Such
customers add nothing to EEs revenues and
reduce the quality of service for true customer.
4

EE management, based on the costs and


industry benchmarks, has concluded that
if < 20% of a salespersons day (1 hour
and 36 minutes of an 8- hour day) is
spent with pseudo customers, then the
drain on service personnel by pseudo
customers will not be considered a serious
problem.
Otherwise, EE must change policy to cut
down pseudo customers.
5

The management team at EE would like to


know the average time a salesperson
spends on pseudo customers.
However, all it has is the information of
100 salespersons (the sample).
What is the best way to use the sample to
estimate the population (or true) mean?

The sample mean, x !


6

Sample size, n=100


n

Sample mean, x

i =1

xi

x1 x2 ... x100
(
)
100

=4800.03 seconds

The threshold set by management


= 1 hour 36 minutes = 5760 seconds,

=> the sample mean is below it


7

s2

x
i 1

n 1

, s s2

Is the time spent by most salespersons


with pseudo customers similar to the
mean? -> No, it fluctuates.
-> We use distribution to summarize it.

Are a few spending a long time while the


others are spending a short time? Is there
a serious fluctuation of time spent?
-> Need to estimate
the distributions standard deviation, .
-> using sample standard deviation, s.
8
s=2610.62 seconds.

Point Estimate
Estimate Population with Sample
Parameter Statistic
_
Mean

X
Proportion
Variance
Difference

-
1

ps
s
2

_
_
x - x
1

Parameter: a numerical characteristic of an entire population.


Statistic: a measure computed from a sample.
9

Proportions

The average, or mean, of the variable with


only two possible outcomes.
E.g. Variable customer buys at least one item
two possible outcome: YES or NO.
Set the variables takes the value 1 if the
customer buys at least one product and 0 if he
or she does not buy any.
If we use 1 and 0 in this way, then the average,
or mean, of the variable is the proportion of
customers who buy at least one item.
10

Customer Identity

Value of variable showing if an item is bought

Customer 1

Customer 2

Customer 3

Customer 4

Customer 5

customers 1, 4, and 5 do not buy any item, and


customer 2 and 3 do.
take the average of the values in the right hand
column. The average is 0.4.
the average of this variable gives the proportion of
the five customers who bought at least one item. 11

When dealing with a variable with two outcomes


coded as 0 and 1, instead of talking about the
mean, we will sometimes use the proportion,
denoted by p.
p is always between 0 and 1.
When p is the mean of the distribution of such a
variable, p(1-p) and p (1 p ) will be its
variance and standard deviation
Note: For proof, see Appendix 3-1

12

Why sample from population?

Expense. A census may not be cost effective.


Speed of Response. There may not be
enough time to obtain more than a sample.
Accuracy. A carefully obtained sample may be
more accurate than a census.
Destructive Sampling. In destructive testing
of products a sample has to suffice.
The large (infinite) population. Sometimes a
census is impossible.
13

Random Sample

In statistical terms a random sample is a set if


items that have been drawn from a population
in such a way that each time an item was
selected, every item in the population had an
equal opportunity to appear in the sample.

Sample with replacement (Yes)


Sample without replacement (Yes)

14

Example

Take random samples of students.


Ask how many courses did you take last
semester?
Calculate statistic, say, the sample mean.

Sample
Sample
Sample
Sample
Sample

1:
2:
3:
4:
5:

2
5
1
4
2

3
4
5
5
3

4
3
3
4
2

mean
mean
mean
mean
mean

=
=
=
=
=

3
4
3
3.33
2.33
15

Example

Sample
Sample
Sample
Sample
Sample

1:
2:
3:
4:
5:

2
5
1
4
2

3
4
5
5
3

continued

4
3
3
4
2

mean
mean
mean
mean
mean

=
=
=
=
=

3
4
3
3.33
2.33

Observation:

Different samples produce different results.


Value of a statistic, like mean or proportion,
depends on the particular sample obtained.
But some values may be more likely than others.
16

Sample Variation

The value of the sample mean varies from


sample to sample.
The source of the variation in the value of
the sample mean is the potential variation
in the sample drawn from the population.
We can view the sample mean as a
variable having a probability distribution.
This distribution is called the sampling
distribution of the sample mean.
17

Sampling Distribution of
the Sample Mean
Sampling distribution of the sample
mean x is the probability distribution of the
population of the sample means obtainable
from all possible samples of size n from a
population of size N.
1 n

n
i 1

Sampling Distribution: Calculate the sample

statistic for every possible sample (of size n).


The distribution of these sample statistics is
the sampling distribution.

18

In general,
Sample Statistics are random variables and
have distributions.

Any estimator based on a sample (sample


statistics) will have a sampling distribution.

E.g. sampling distributions for the sample mean,


sampling distributions for the sample proportion,
sampling distributions for the sample variance,

19

Example
Developing Sampling Distribution
of Sample Mean
Suppose theres a
population...

C
D

Random variable, X,
is Age of individuals
Values of X: 18, 20, 22, 24
(measured in years)
EVERYONE is one of these 4
ages in this population

20

Example

Continued

Population Characteristics
Summary Measure

Population Distribution

X
i 1

P(X)

.3

18 20 22 24

21
4

.2
.1
0

i
i 1

2 . 236

(18)

(20)

(22)

(24)

21

Example

Continued

All Possible Samples of Size n = 2


1st
Obs

2nd Observation
18
20
22
24

18 18,18 18,20 18,22 18,24

16 Sample Means

20 20,18 20,20 20,22 20,24

1st 2nd Observation


Obs 18 20 22 24

22 22,18 22,20 22,22 22,24

18 18 19 20 21

24 24,18 24,20 24,22 24,24

20 19 20 21 22

16 Samples
Samples Taken with
Replacement

22 20 21 22 23
24 21 22 23 24
22

Example

Continued

Sampling Distribution of All Sample Means


16 Sample Means

1st 2nd Observation


Obs 18 20 22 24

P(X)

18 18 19 20 21

.3

20 19 20 21 22

.2

22 20 21 22 23

.1

24 21 22 23 24

# in sample = 2,

Sample Means
Distribution

18 19

20 21 22 23

24

# in Sampling Distribution = 16

X
23

Example

Continued

Summary Measures for the Sampling Distribution


N

X
i 1

18 19 19 24

21
16

X
N

i1

18 21 19 21
2

16

24 21

1.58
24

Example

Continued

Comparing the Population with its


Sampling Distribution
Population

Sample Means Distribution

_ x 21

= 21, = 2.236
P(X)
.3

P(X)
.3

.2

.2

.1

.1

X 0

(18)

(20)

(22)

(24)

18 19

n=2

x 1.58

20 21 22 23

24

25

Example

Continued

Results for other sample sizes


Sample Size Mean
Variance St. Dev
1
21
5
2.236
2
21
2.5 1.581139
3
21 1.666667 1.290994
4
21
1.25 1.118034
N

X
i 1

18 20 22 24

21
4

i
i 1

2.236
26

Unbiased estimator

Sampling distributions -> give us an idea


about the accuracy of an estimator.
The estimators that we commonly
consider are all unbiased.
An estimator is unbiased if the mean of
the sampling distribution of the estimator
is equal to what is being estimated. E ()

E.g. E ( X ) , E ( s 2 ) 2
Otherwise its biased, e.g. E (s )

Note: For proof, see Appendix 3-2

27

Biased

E ()

expected true

P(X)
Unbiased

Biased

Bias =
= the difference between the expected value of the
estimator and the true value in the population.

28

sample

E( X )

population mean
29

Sampling distributions -> give us an idea


about the accuracy of an estimator.

More accurate
(sampling
distribution
of Estimate 1)

less accurate
(sampling
distribution
of Estimate 2)

A sampling distribution tightly concentrated around the mean tells us


that the estimator is likely to be much more accurate (i.e., closer to the
true value) than one that has a sampling distribution widely dispersed
around the average.

30

How accurate an estimator is the sample


mean?
What is the sampling distribution of the
sample mean?

31

Demo of Sampling Distribution

Rice Virtual Lab in Statistics

32

Sampling Distribution of
the Sample Mean: Normal Model

Normal Models
Sample

means are normally distributed if the


individual values are normally distributed.
Sampling Distributions

Population Distribution
=
= 10
10

=
= 50
50

n =16
X = 2.5

n=4
X = 5
-X

-=
X
= 50
50

X
X

Central Tendency
_
=
x
Variation

x_ =

X
X

Sample

means are normally distributed because


of the Central Limit Theorem (when sample size
Central Tendency
condition is satisfied).

=
Population Distribution

= 10
= 50

Sampling Distributions
n=4
X = 5

n =30
X =
1.8
X 50

x_ =

n
33

Central Limit Theorem


As Sample
Size Gets
Large
Enough

Sampling Distribution
of the sample mean
Becomes
Almost Normal
regardless of
shape of
population

X
X
34

Central Limit Theorem

As the sample size increases the sampling


distribution of the sample mean
approaches the normal distribution with
mean and variance 2/n

X ~ N ( ,

Note: the sample mean is normally distributed with

X ( E ( X )) ,

35

When Does CLT Hold?

36

CLT and Sample Size

If the population distribution is normal to start


with, the distribution of x-bars will have a
normal shape for all sample sizes.
Notice that it takes large sample sizes (n>=30)
for the distribution of x-bars to become normal
for this very skewed population distribution. 37

General Conclusions
1.

2.

If the population of individual items is normal,


then the population of all sample means is
also normal

Even if the population of individual items is


not normal, there are circumstances when the
population of all sample means is normal
(Central Limit Theorem)
38

General Conclusions

Continued

The mean of all possible sample means equals


the population mean

3.

That is, x =

The standard deviation x of all sample means


is less than the standard deviation of the
population

4.

That is, x <


Each sample mean averages out the high and the
low measurements, and so are closer to than
many of the individual population measurements

39

Example

Cola bottles filled so that contents X have


a normal distribution with mean=298ml
and standard deviation sigma=3ml.
What proportion of bottles have less than
295ml?
2
X ~ N (298,3 )
X 298 295 298
P ( X 295) P (

)
3
3
P ( Z 1) 0.1586
40

Example

Continued

What is the probability of the average of a six


pack of bottles being less than 295ml?

3 2
X ~ N (298, ( ) )
6
_

P( X < 295)
X 298 295 298
)
P(

3/ 6
3/ 6
P( Z 2.45) .0071

41

Example

Continued

_
P(X<295) = 0.1586, but P( X < 295) = 0.0071
Averages have less variation than individual
observations.
As the sample size increases, the variation in the
distribution decreases so that a value like 295ml is very
difficult and rare to occur in an average of a six pack or
more of bottles, but could quite easily occur
_ in a single
bottle.
X of 6
pdf

bottles

X, One
bottle

298
295

X
42

Example

Suppose a population has mean = 8 and


standard deviation = 2. Suppose a random
sample of size n = 36 is selected.
What is the probability that the sample mean
is between 7.8 and 8.2?
Solution:

Even if the population is not normally distributed,


the central limit theorem can be_used (n > 30)
so the sampling distribution of X is approximately
normal with mean = 8 , standard deviation = 2 / 36
7 .8 8
X 8
8 .2 8
P ( 7 .8 X 8 .2 ) P (

)
2 / 36 2 / 36 2 / 36
P ( 0 .6 Z 0 .6 ) 0 .2257 2 0 .4514

43

Do we have to draw all possible samples


from the population to get a sampling
distribution of an estimator?
NO!
Statistics tells us that a single sample is
enough to allow us to approximate the
sampling distribution of most estimators.
44

Estimating the sampling


distribution of the sample mean
The standard deviation of the sample mean:

X
n

Population standard deviation, , is never


observed -> estimate it, using s
x x
s
n 1
The best estimator of is:
n

i 1

sX

s
n

the standard error of the mean.

45

The sampling distribution of


the standardized sample mean:
X
~ N(0, 1)
X
The standard error of the mean is only an
estimate based on the sample
It introduces some additional sampling
error into our calculations
X
~ t n -1
t
sX

-> reflected in the fatter tails of the tdistribution compared to the standard normal

Properties of t Distribution

t-distribution has the probability density


function given by

where v is the degree of freedom (df)


df = n 1, where n is sample size

47

Degree of Freedom

Idea: Number of observations that are


free to vary after sample mean has been
calculated
Example: Suppose the mean of 3
numbers is 10

Let x1=8
Let x2=13
Whats x3?

10

Here, n = 3, so degrees of freedom = n 1 = 3 1 = 2


(2 values can be any numbers, but the third is not free to vary for a given mean)
48

Case: Double E (EE)

=4880.03 seconds, s=2610.62 seconds,


n=100
The standard error of the mean time a
salesperson spends on pseudo customers:
X

sX

continued

2610 . 62
2610 . 62
s

261 . 06 seconds
10
n
100

The estimated sampling distribution of the


standardized sample mean:
X
X
t

sX

261 . 06

~ t 99

49

Exercise Cashing out

A mortgage bank in New jersey would like to understand


how much home equity customers who refinance their
homes are likely to cash out. A sample of 64 loans is
collected. Sample mean cash out value=$ 5M , s=$0.8M;
80% of the customer cash out when they refinanced.
What is the estimated sampling distribution of the
standardized mean cash out value for customer at the
bank?
n=64

X
X
X
t

sX
800000 / 64 100000

~ t 63
50

Take away for Topic 3 - a

Point estimate

Basic Concepts for Sampling Distribution

Properties of Estimators that we desire

Unbiased, smaller variance of the sampling distribution

Central Limit Theorem

Why sample from population


Population, sample, Parameter, Statistic
Sample variation, sampling distribution

Content (normal), Condition (n>30)

Estimating the sampling distribution of the sample mean

t-distribution

51

Where Do We Go From Here?


Descriptive
Statistics

Probability

Random
variables

Discrete
Random
Variable
Continuous
Random
Variable
Sampling
distribution

Simple
Linear
regression

Hypothesis
testing

Estimation
Confidence
intervals
52

Thats all, folks!


Adios~
53

Appendix 3-1:
Proof for Variance of
Proportion

54

Let's suppose there are m 1s (and n-m 0s) among the n subjects.
Then,
and
is equal to (1-m/n) for m observations
and 0-m/n for (n-m) observations. When these results are combined, the
final result is

and the sample variance of the 0/1 observations is

55

Appendix 3-2:
Proof that sample
variance is unbiased

56

57

58

You might also like