You are on page 1of 40

Chapter 7, Part A

Sampling and Sampling Distributions

 Simple Random Sampling

 Point Estimation

 Introduction to Sampling Distributions

 Sampling Distribution of X

© 2020 Cengage EMEA. All Rights Reserved. Slide


1
Statistical Inference

The
The purpose
purpose of
of statistical
statistical inference
inference is
is to
to obtain
obtain
information
information about
about aa population
population from
from information
information
contained
contained in
in aa sample.
sample.

A
A population
population is
is the
the set
set of
of all
all the
the elements
elements of
of interest.
interest.

A
A sample
sample is
is aa subset
subset of
of the
the population.
population.

© 2020 Cengage EMEA. All Rights Reserved. Slide


2
Statistical Inference

The
The sample
sample results
results provide
provide only
only estimates
estimates of
of the
the
values
values of
of the
the population
population characteristics.
characteristics.

With
With proper
proper sampling
sampling methods,
methods, the
the sample
sample results
results
can
can provide
provide “good”
“good” estimates
estimates of
of the
the population
population
characteristics.
characteristics.

A
A parameter
parameter is
is aa numerical
numerical characteristic
characteristic of
of aa
population.
population.

© 2020 Cengage EMEA. All Rights Reserved. Slide


3
Simple Random Sampling:
Finite Population

 Finite populations are often defined by lists such as:


• Organization membership roster
• Credit card account numbers
• Inventory product numbers

 A simple random sample of size n from a finite


population of size N, is a sample selected such that
each possible sample of size n has the same
probability of being selected.

© 2020 Cengage EMEA. All Rights Reserved. Slide


4
Simple Random Sampling:
Finite Population

 Replacing each sampled element before selecting


subsequent elements is called sampling with
replacement.

 Sampling without replacement is the procedure


used most often.

 In large sampling projects, computer-generated


random numbers are often used to automate the
sample selection process.

© 2020 Cengage EMEA. All Rights Reserved. Slide


5
Simple Random Sampling:
Infinite Population

 Infinite populations are often defined by an ongoing


process whereby the elements of the population consist
of items generated as though the process would
operate indefinitely.

 A simple random sample from an infinite population


is a sample selected, such that the following conditions
are satisfied:
• Each element selected comes from the same
population.
• Each element is selected independently.

© 2020 Cengage EMEA. All Rights Reserved. Slide


6
Simple Random Sampling:
Infinite Population

 In the case of infinite populations, it is impossible to


obtain a list of all elements in the population.

 The random number selection procedure cannot be


used for infinite populations.

© 2020 Cengage EMEA. All Rights Reserved. Slide


7
Point Estimation

In
In point
point estimation
estimation wewe use
use the
the data
data from
from the
the sample
sample
to
to compute
compute aa value
value ofof aa sample
sample statistic
statistic that
that serves
serves
as
as an
an estimate
estimate of
of aa population
population parameter.
parameter.

We
We refer to X as
refer to as the
the point
point estimator
estimator of
of the
the population
population
mean ..
mean

SS is
is the
the point
point estimator
estimator of
of the
the population
population standard
standard
deviation ..
deviation

PP is
is the
the point
point estimator
estimator of
of the
the population proportion pp..
population proportion

© 2020 Cengage EMEA. All Rights Reserved. Slide


8
Sampling Error

 When the expected value of a point estimator is equal


to the population parameter, the point estimator is said
to be unbiased.
 The absolute value of the difference between an
unbiased point estimate and the corresponding
population parameter is called the sampling error.
 Sampling error is the result of using a subset of the
population (the sample), and not the entire
population.
 Statistical methods can be used to make probability
statements about the size of the sampling error.

© 2020 Cengage EMEA. All Rights Reserved. Slide


9
Sampling Error

 The sampling errors are:

|x   | for sample mean

|s   | for sample standard deviation

| p   | for sample proportion

© 2020 Cengage EMEA. All Rights Reserved. Slide


10
Example: St. Andrew’s

St. Andrew’s College receives


900 applications annually from
prospective students. The
application form contains
a variety of information, including the individual’s
scholastic aptitude test (SAT) score and whether or not
the individual desires on-campus housing.

© 2020 Cengage EMEA. All Rights Reserved. Slide


11
Example: St. Andrew’s

The director of admissions


would like to know the
following information:
• the average SAT score for
the 900 applicants, and
• the proportion of applicants who want to live on
campus.

© 2020 Cengage EMEA. All Rights Reserved. Slide


12
Example: St. Andrew’s

We shall now look at two


alternatives for obtaining the
desired information.
 Conducting a census of the
entire 900 applicants.
 Selecting a sample of 30 applicants, using Excel.

© 2020 Cengage EMEA. All Rights Reserved. Slide


13
Conducting a Census

 If the relevant data for the entire 900 applicants were


in the college’s database, the population parameters of
interest could be calculated using the formulae
presented in Chapter 3.

 We will assume for the moment that conducting a


census is practical in this example.

© 2020 Cengage EMEA. All Rights Reserved. Slide


14
Conducting a Census

 Population Mean SAT Score:


 xi
 990
900

 Population Standard Deviation for SAT Score:


 i
( x   ) 2

 80
900

 Population Proportion wanting On-Campus Housing:


648
  0.72
900

© 2020 Cengage EMEA. All Rights Reserved. Slide


15
Simple Random Sampling

 Now suppose that the necessary data on the


current year’s applicants were not yet entered in the
college’s database.

 Furthermore, the Director of Admissions must obtain


estimates of the population parameters of interest for
a meeting taking place in a few hours.

 She decides a sample of 30 applicants will be used.

 The applicants were numbered, from 1 to 900, as


their applications arrived.

© 2020 Cengage EMEA. All Rights Reserved. Slide


16
Simple Random Sampling:
Using a Random Number Table
 Taking a Sample of 30 Applicants

• Because the finite population has 900 elements, we


shall need 3-digit random numbers to randomly
select applicants numbered from 1 to 900.

• We shall use the last three digits of the 5-digit


random numbers in the third column of the
textbook’s random number table, and continue
into the fourth column as needed.

© 2020 Cengage EMEA. All Rights Reserved. Slide


17
Simple Random Sampling:
Using a Random Number Table
 Taking a Sample of 30 Applicants
• The numbers we draw will be the numbers of the
applicants we will sample unless
• the random number is greater than 900 or
• the random number has already been used.
• We shall continue to draw random numbers until
we have selected 30 applicants for our sample.
• (We shall go through all of column 3 and part of
column 4 of the random number table,
encountering
in the process five numbers greater than 900 and
one duplicate, 835.)

© 2020 Cengage EMEA. All Rights Reserved. Slide


18
Simple Random Sampling:
Using a Random Number Table
 Use of Random Numbers for Sampling

3-Digit Applicant
Random Number Included in Sample
744 No. 744
436 No. 436
865 No. 865
790 No. 790
835 No. 835
902 Number exceeds 900
190 No. 190
836 No. 836
. . . and so on

© 2020 Cengage EMEA. All Rights Reserved. Slide


19
Simple Random Sampling:
Using a Random Number Table
 Sample Data

Random SAT Live On-


No. Number Applicant Score Campus
1 744 Conrad Harris 1025 Yes
2 436 Enrique Romero 950 Yes
3 865 Fabian Avante 1090 No
4 790 Lucila Cruz 1120 Yes
5 835 Chan Chiang 930 No
. . . . .
. . . . .
30 498 Emily Morse 1010 No

© 2020 Cengage EMEA. All Rights Reserved. Slide


20
Simple Random Sampling:
Using a Computer
 Taking a Sample of 30 Applicants

• Computers can be used to generate random


numbers for selecting random samples.

• For example, Excel’s function


= RANDBETWEEN(1,900)
can be used to generate random numbers between
1 and 900.

• Then we choose the 30 applicants corresponding


to the 30 smallest random numbers as our sample.

© 2020 Cengage EMEA. All Rights Reserved. Slide


21
Point Estimation

 x as Point Estimate of 

x
 x

29, 910
i
 997
n 30
 s as Point Estimate of 

s
 i
( x  x ) 2


163, 996
 75.2
n1 29
 p as Point Estimate of p
p  20 30  0.68
Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.

© 2020 Cengage EMEA. All Rights Reserved. Slide


22
Summary of Point Estimates
Obtained from a Simple Random Sample

Population Parameter Point Point


Parameter Value Estimator Estimate
m = Population mean 990 X = Sample mean 997
SAT score SAT score

s = Population std. 80 S = Sample std. 75.2


deviation for deviation for
SAT score SAT score

p = Population pro- 0.72 P = Sample pro- 0.68


portion wanting portion wanting
campus housing campus housing

© 2020 Cengage EMEA. All Rights Reserved. Slide


23
Sampling Distribution of X

 Process of Statistical Inference

Population A simple random sample


with mean of n elements is selected
m=? from the population.

The value of X is used to The sample data


make inferences about provide a value for
the value of m. the sample mean X.

© 2020 Cengage EMEA. All Rights Reserved. Slide


24
Sampling Distribution of X

The sampling distribution of X is the probability


distribution of all possible values of the sample
mean X .

Expected Value of X

E(X ) = 

where:
 = the population mean

© 2020 Cengage EMEA. All Rights Reserved. Slide


25
Sampling Distribution of X

Standard Deviation of X
Finite Population Infinite Population

N n 
X  ( ) X 
n N 1 n

• A finite population is treated as being


infinite if n/N < 0.05.
• ( N  n ) / ( N  1) is the finite population correction
factor.
•  X is referred to as the standard error of
the
mean.
© 2020 Cengage EMEA. All Rights Reserved. Slide
26
Form of the Sampling Distribution of X

When the population has a normal distribution, the


sampling distribution of X is normal in shape for any
sample size.

In most applications, the sampling distribution of X


can be approximated by a normal distribution
whenever the sample is size 30 or more.

In cases where the population is highly skewed or


outliers are present, samples of size 50 may be
needed.

© 2020 Cengage EMEA. All Rights Reserved. Slide


27
Sampling Distribution of X for SAT Scores

Sampling
Distribution  80
of X X    14.6
n 30

x
E( X )  990

© 2020 Cengage EMEA. All Rights Reserved. Slide


28
Sampling Distribution of X for SAT Scores

What is the probability that a simple random sample


of 30 applicants will provide an estimate of the
population mean SAT score that is within +/-10 of
the actual population mean  ?
In other words, what is the probability that X will be
between 980 and 1000?

© 2020 Cengage EMEA. All Rights Reserved. Slide


29
Sampling Distribution of X for SAT Scores

Step 1: Calculate the z-value at the upper endpoint of


the interval.
z = (1000 - 990)/14.6= 0.68

Step 2: Find the area under the curve to the left of the
upper endpoint.
P(Z < 0.68) = 0.7517

© 2020 Cengage EMEA. All Rights Reserved. Slide


30
Sampling Distribution of X for SAT Scores

Cumulative Probabilities for


the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .

© 2020 Cengage EMEA. All Rights Reserved. Slide


31
Sampling Distribution of X for SAT Scores

Sampling
Distribution  X  14.6
of X

Area = 0.7517

x
990 1000

© 2020 Cengage EMEA. All Rights Reserved. Slide


32
Sampling Distribution of X for SAT Scores

Step 3: Calculate the z-value at the lower endpoint of


the interval.
z = (980 - 990)/14.6= - 0.68

Step 4: Find the area under the curve to the left of the
lower endpoint.
P(Z < -0.68) = 0.2483

© 2020 Cengage EMEA. All Rights Reserved. Slide


33
Sampling Distribution of X for SAT Scores

Sampling
Distribution  X  14.6
of X

Area = 0.2483

x
980 990

© 2020 Cengage EMEA. All Rights Reserved. Slide


34
Sampling Distribution of X for SAT Scores

Step 5: Calculate the area under the curve between


the lower and upper endpoints of the interval.

P(-0.68 < Z < 0.68) = P(Z < 0.68) - P(Z < -0.68)
= 0.7517 - 0.2483
= 0.5034
The probability that the sample mean SAT score will
be between 980 and 1000 is:

P(980 < X < 1000) = 0.5034

© 2020 Cengage EMEA. All Rights Reserved. Slide


35
Sampling Distribution of X for SAT Scores

Sampling
Distribution  X  14.6
of X

Area = 0.5034

x
980 990 1000

© 2020 Cengage EMEA. All Rights Reserved. Slide


36
Relationship Between the Sample Size
and the Sampling Distribution of X

 Suppose we select a simple random sample of 100


applicants instead of the 30 originally considered.

 E(X) = m regardless of the sample size. In our


example, E(X) remains at 990.

 Whenever the sample size is increased, the standard


error of the mean  X is decreased. With the increase
in the sample size to n = 100, the standard error of the
mean is decreased to:
 80
X    8.0
n 100
© 2020 Cengage EMEA. All Rights Reserved. Slide
37
Relationship Between the Sample Size
and the Sampling Distribution of X

With n = 100,
X  8

With n = 30,
 X  14.6

x
E( X )  990

© 2020 Cengage EMEA. All Rights Reserved. Slide


38
Relationship Between the Sample Size
and the Sampling Distribution of X

 Recall that when n = 30, P(980 < X < 1000) = 0.5034.

 We follow the same steps to solve for P(980 < X < 1000)
when n = 100 as we showed earlier when n = 30.

 Now, with n = 100, P(980 < X < 1000) = 0.7888.

 Because the sampling distribution with n = 100 has a


smaller standard error, the values of X have less
variability and tend to be closer to the population
mean than the values of X with n = 30.

© 2020 Cengage EMEA. All Rights Reserved. Slide


39
Relationship Between the Sample Size
and the Sampling Distribution of X

Sampling
Distribution X  8
of X

Area = 0.7888

x
980 990 1000

© 2020 Cengage EMEA. All Rights Reserved. Slide


40

You might also like