You are on page 1of 30

BFC 34303

CIVIL ENGINEERING STATISTICS


Chapter 5
Sampling and Estimation
Faculty of Civil and Environmental Engineering
Universiti Tun Hussein Onn Malaysia

Sampling
If we are studying a large population, say to find the height of people in
that population, it may be difficult to look at every individual.
We therefore take a random sample. We might sample 100 people and
take the average height of the sample.

How good an estimate will this be?

Is it likely to be biased upwards or


downwards from the population average?
SAMPLE POPULATION
How inaccurate is it likely to be?

How big a sample do we need to be


confident of getting a figure close to the
true figure?
2

1
Why is sampling needed?
In many cases, sampling is the only way to determine something about
the population. Some of the reasons for sampling are:

The physical To save time because


To reduce the cost
impossibility of contacting the entire
of studying the
checking all items in population would often be
entire population
the population time consuming

Some tests are destructive


To help minimise error
(e.g. testing the strength of
due to large number in
concrete requires the sample
the population
to fail under a certain load)

Sampling Distributions
Sets of data based on many samples drawn from a population are called
sampling distributions.
They are often used to describe the chance fluctuations of mean values
and standard deviations based on random sampling.
The sampling distribution of a statistic is the probability distribution of that
statistic.
In other words, it is the distribution of the statistic if we were to repeatedly
draw samples from the population.
For example, we want to know the average weight of 16 students in a
class.

2
Population

The population mean, 𝜇


62 51 85 65
62 + 51 + 85 + ⋯ + 74
𝜇= = 68.63 𝑘𝑔
74 59 82 70
16

56 66 83 87

68 51 65 74

Note: Weight in kilograms (kg)

Sample A

The sample mean, 𝑥̅


62 51 85 65
51 + 70 + 56 + 68
𝑥̅ = = 61.25 𝑘𝑔
74 59 82 70
4

56 66 83 87

68 51 65 74

Note: Weight in kilograms (kg)

3
Sample B

The sample mean, 𝑥̅


62 51 85 65
85 + 74 + 83 + 74
𝑥̅ = = 79.0 𝑘𝑔
74 59 82 70
4

56 66 83 87

68 51 65 74

Note: Weight in kilograms (kg)

In repeated sampling, the value of the sample statistic (in this case, the
sample mean) would vary from sample to sample.
If we took many samples and plotted the sample means in a histogram,
the histogram of sample means will closely resemble the true sampling
distribution of the sample mean.

No. of
samples

60 62 64 66 68 70 72 74 76 78 80 Mean weight, 𝑋 (kg)


8

4
We can see that the sample mean will be distributed about the population
mean in some way.
The sample mean will have a distribution that is approximately normal.
The sampling distribution of 𝑋 is the distribution of 𝑋 in all possible
sample sizes of 𝑛 from this distribution.
No. of 𝜇
samples

60 62 64 66 68 70 72 74 76 78 80 Mean weight, 𝑋 (kg)


9

In practice, we do not repeatedly sample many times from the population.


We typically draw only one sample from the population.
The value of a statistic will be a random sample from the statistic’s
sampling distribution.
We use mathematical arguments based on the statistic’s sampling
distribution to make statements about population parameters. This will
play an important role in inferential statistics.

For example, we may make the following statement or inference:

𝑊𝑒 𝑎𝑟𝑒 95% 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑡 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑤𝑖𝑡ℎ𝑖𝑛 22.1 𝑜𝑓 𝑡ℎ𝑒
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛.

10

5
Sampling Distribution of the Sample Mean
If we were to organise the means of all possible samples into a probability
distribution, we would obtain the sampling distribution of the sample
mean.

Example 5.1
Given below are the number of projects that engineers in an engineering
firm were involved in for the previous year.
No. of a) What is the sampling distribution of
Name of engineer
projects
the sample mean for samples of
Jack 8
size 2?
Nelson 4
Susan 5 b) What is the mean of the sampling
Paul 7 distribution?
11

(a) The population mean, 𝜇 = = 6.0

The number of all possible samples of 2 can be determined as follows:


N𝐶 where 𝑁 = population size and 𝑛 is sample size
4𝐶 =6

Sample means for Sample Engineers


No. of
Sum Mean
projects
all possible
1 Jack, Nelson 8, 4 12 6.0
samples of 2
2 Jack, Susan 8, 5 13 6.5
engineers:
3 Jack, Paul 8, 7 15 7.5
4 Nelson, Susan 4, 5 9 4.5
5 Nelson, Paul 4, 7 11 5.5
6 Susan, Paul 5, 7 12 6.0

12

6
Sampling distribution of the Sample Mean for 𝑛 = 2

Number of
Sample Mean Probability
means
4.5 1 0.1667
5.5 1 0.1667
6.0 2 0.3333
6.5 1 0.1667
7.5 1 0.1667
6 1.0000

Note that 𝜇 = 𝜇
(b) Mean of the sampling distribution, 𝜇
4.5 + 5.5 + 6.0 + 6.0 + 6.5 + 7.5
𝜇 = = 6.0
6
13

Example 5.2
A wall paint supplier sells large numbers of 2-liter and 5-liter cans of paint,
which are sold in the ratio of 3:2. A random sample of 3 cans are taken
from the storeroom. Determine the sampling distribution for the sample
mean 𝑋.

Probability of getting a 2-liter can = 3/5 = 0.6


Probability of getting a 5-liter can = 2/5 = 0.4

x 2 5
P(X=x) 0.6 0.4

14

7
Sample Cans Sum Mean Probability
1 2, 2, 2 6 2 0.6 x 0.6 x 0.6 = 0.216
2 2, 2, 5 9 3 0.6 x 0.6 x 0.4 = 0.144
3 2, 5, 2 9 3 0.6 x 0.4 x 0.6 = 0.144
4 2, 5, 5 12 4 0.6 x 0.4 x 0.4 = 0.096
5 5, 5, 5 15 5 0.4 x 0.4 x 0.4 = 0.064
6 5, 2, 2 9 3 0.4 x 0.6 x 0.6 = 0.144
7 5, 2, 5 12 4 0.4 x 0.6 x 0.4 = 0.096
8 5, 5, 2 12 4 0.4 x 0.4 x 0.6 = 0.096
1.00

15

Sampling distribution:

𝑆𝑎𝑚𝑝𝑙𝑒 𝑀𝑒𝑎𝑛 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦


𝑋 𝑃(𝑋 = 𝑋)
2 0.216
3 0.432 0.144 + 0.144 + 0.144

4 0.288 0.096 + 0.096 + 0.096


5 0.064

16

8
Central Limit Theorem
If all samples of a particular size are selected from any population, the
sampling distribution of the sample mean is approximately a normal
distribution. This approximation improves with larger samples.
Suppose we are sampling from a population with mean 𝜇 and standard
deviation 𝜎. Let 𝑋 be the random variable representing the sample mean
of 𝑛 independently drawn observations.
• The mean of the sampling distribution of 𝑋, is given by
𝜇 =𝜇
• The standard deviation of the sampling distribution of 𝑋, or standard
error is given by
𝜎
𝜎 =
𝑛

17

If the population is normally distributed, then the 𝑋 is also normally


distributed.
What if the population is not normally distributed? The central limit
theorem addresses this question.
The distribution of the 𝑋 tends toward the normal distribution as the
sample size increases, regardless of the distribution from which we are
sampling.

n=1 n=4 n = 16 n = 32

18

9
n = 64 n = 128 n = 256 n = 512

Many statistics have distributions that are approximately normal for large
sample sizes, even when we are sampling from a distribution that is not
normal.
Thus, we can use inferential statistics that are based on a normal
distribution even if we are sampling from a population that is not normally
distributed, provided we have a large sample size.

19

The central limit theorem tells us that the z-value tends, in distribution, to
the Standard Normal Distribution as the sample size tends to infinity

𝑋−𝜇
𝑧= → 𝑁 0,1 𝑎𝑠 𝑛→∞
𝜎
𝑛

provided that 𝜇 and 𝜎 are finite.

20

10
Example 5.3
Salaries at a large corporation have a mean of $62,000 and a standard
deviation of $32,000. If 100 employees are randomly selected, what is the
probability their average salary exceeds $66,000?

𝑋 is the average salary of a sample of 𝑛 = 100 employees


The population 𝜇 = $62,000 and 𝜎 = $32,000

Using the central limit theorem, we can say that the sampling distribution
of 𝑋 is approximately normally distributed.

𝑃(𝑋 > 66,000) = 𝑃(𝑍 > 𝑧)

21

66,000 − 62,000
𝑧= = 1.25
32,000
100

𝑃(𝑋 > 66,000) = 𝑃(𝑍 > 1.25)


Z
0 1.25

𝑃 𝑍 > 1.25 = 0.5 − 𝑃 0 < 𝑍 < 1.25


= 0.5 − 0.3944
= 𝟎. 𝟏𝟎𝟓𝟔

22

11
Areas under the Standard Normal Curve (z-Table) showing values for P(0 ≤ Z ≤ z)
0 z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

23

Example 5.4
The random variable, 𝑋 represents the time (in hours) spent by an
undergraduate for weekly revisions. It has the following probability
distribution:

x 4 5 6 7
P(X=x) 0.2 0.4 0.3 0.1

Calculate the probability that the average time spent for weekly revisions
by a class of 36 students will be less than 5.5 hours.

24

12
Let 𝑋 be the average time spent for weekly revisions of 𝑛 = 36 students
The population 𝜇 = ? and 𝜎 = ?

x 4 5 6 7 Σ
P(X=x) 0.2 0.4 0.3 0.1
x.P(X) 0.8 2.0 1.8 0.7 5.3
x2 16 25 36 49
x2.P(X) 3.2 10.0 10.8 34.3 58.3

𝜇=𝐸 𝑋 = 𝑥. 𝑃 𝑋 = 5.3

𝐸 𝑋 = 𝑥 . 𝑃 𝑋 = 58.3

𝜎 = 𝑆𝑡𝑑 𝑋 = 𝐸 𝑋 − 𝐸 𝑋 = 58.3 − 5.3 = 5.5


25

Using the central limit theorem, we can say that the sampling distribution
of 𝑋 is approximately normally distributed.

𝑃 𝑋 < 5.5 = 𝑃(𝑍 < 𝑧)

5.5 − 5.3
𝑧= = 0.22
5.5
36

𝑃(𝑋 < 5.5) = 𝑃(𝑍 < 0.22) 0 0.22

𝑃 𝑍 < 0.22 = 0.5 + 𝑃 0 < 𝑍 < 0.22 = 0.5 + 0.0871 = 𝟎. 𝟓𝟖𝟕𝟏

26

13
Areas under the Standard Normal Curve (z-Table) showing values for P(0 ≤ Z ≤ z)
0 z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

27

Sampling Distribution of the Difference between Two


Means
The sampling distribution of the difference between two means shows the
distribution of means of two samples drawn from the two independent
populations, such that the difference between the population means can
possibly be evaluated by the difference between the sample means.

𝜎 𝜎

𝑋 𝑌
𝜇 𝜇
28

14
Sampling distribution of 𝑋 Sampling distribution of 𝑌
with sample size 𝑛 with sample size 𝑚

𝜎 𝜎
𝜎 = 𝜎 =
𝑛 𝑚

𝑋 𝑌
𝜇 =𝜇 𝜇 =𝜇

𝜎 =𝜎 +𝜎 𝑍 =𝑋−𝑌 𝜇 =𝜇 −𝜇

29

𝜎 𝜎
𝜎 =𝜎 +𝜎 = +
𝑛 𝑚

𝜎 𝜎
𝜎 = +
𝑛 𝑚

30

15
Example 5.5
The heights of male students are approximately normally distributed with
a mean of 177.7 cm and a standard deviation of 5.6 cm. The heights of
female students are also approximately normally distributed with a mean
of 163.0 cm and a standard deviation of 5.1 cm. If 20 males and 15
females are randomly selected, determine the probability that the average
height of males is at least 10 cm greater than the height of females.

Let 𝑋 be the mean height of males and 𝑌 be the mean height of females.
5.6 5.1
𝑋 ~ 𝑁 177.7, 𝑌 ~ 𝑁 163.0,
20 15
5.6 5.1
𝑋 − 𝑌 ~ 𝑁 177.7 − 163.0, +
20 15
31

𝑋 − 𝑌 ~ 𝑁 14.7, 3.30

We want to find 𝑃 𝑋 − 𝑌 ≥ 10

10 − 14.7
𝑃 𝑋 − 𝑌 ≥ 10 = 𝑃 𝑍 ≥
3.30
-2.59 0
𝑃 𝑍 ≥ −2.59 =?

𝑃 𝑍 ≥ −2.59 = 0.5 + 𝑃 0 ≤ 𝑍 ≤ 2.59 = 0.5 + 0.4954 = 𝟎. 𝟗𝟗𝟓𝟒

32

16
Areas under the Standard Normal Curve (z-Table) showing values for P(0 ≤ Z ≤ z)
0 z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

33

Point Estimates and Confidence Intervals


A point estimate is a single statistic used to estimate a population
parameter. For example, the sample mean 𝑋 is a point estimate of the
population mean 𝜇.
While we expect the point estimate to be close to the population
parameter, we would like to measure how close it really is. A confidence
interval serves this purpose.
Confidence interval is a range of values constructed from sample data
so the parameter occurs within that range at a specified probability.
The specified probability is called the level of confidence.

34

17
For reasonably large samples, the results of the central limit theorem
allow us to state the following:
• 95% of the sample means selected from a population will be within 1.96
standard deviations of the population mean 𝜇.
• 99% of the sample means selected from a population will be within 2.58
standard deviations of the population mean 𝜇.
The standard deviation discussed here is the standard deviation of the
sampling distribution of the sample mean. It is usually called standard
error.
Intervals computed in the abovementioned manner are called the 95%
confidence interval and the 99% confidence interval.

35

0.95 (95%) 0.99 (99%)

0.5 – 0.005 = 0.4950 0.5 – 0.005 = 0.4950


0.5 – 0.025 = 0.4750 0.5 – 0.025 = 0.4750

0.025 (2.5%) 0.025 (2.5%) 0.005 (0.5%) 0.005 (0.5%)

z z

From the z-table, z = 1.96 From the z-table, z = 2.58

95% confidence interval: 99% confidence interval:


𝑠 𝑠
𝑋 ± 1.96 𝑋 ± 2.58
𝑛 𝑛
36

18
Areas under the Standard Normal Curve (z-Table) showing values for P(0 ≤ Z ≤ z)
0 z

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

37

In general, a confidence interval for the population mean is computed by:


𝑠
𝑋±𝑧
𝑛

where 𝑋 = sample mean


𝑧 = z-value corresponding to the level of confidence
𝑠 = standard error (or standard deviation)
𝑛 = sample size (𝑛 ≥ 30)

Confidence
70% 75% 80% 85% 90% 92% 95% 96% 98% 99%
Level

𝑧 1.04 1.15 1.28 1.44 1.65 1.75 1.96 2.05 2.33 2.58

38

19
Example 5.6
A job search company is studying the salary of civil engineers in Malaysia.
A random sample of 256 civil engineers reveals a mean monthly salary of
RM 4,520 and a standard deviation of RM 205. Determine the 95%
confidence interval.
𝑠 205
𝑋±𝑧 = 𝑅𝑀 4,520 ± 1.96 = 𝑅𝑀 4,520 ± 𝑅𝑀 25.11
𝑛 256

Therefore, the 95% CI is RM 4,494.89 – RM 4,545.11

We can be 95% confident that the population mean lies within RM


4,494.89 and RM 4,545.11

Note: RM 4,494.89 and RM 4,545.11 are called the confidence limits.


39

What does the confidence interval tell us?

If we were to select many more samples of 256 civil


engineers and construct the confidence interval for
each sample, 95% of these confidence intervals will
contain the population mean.

z
RM 4,494.89 RM 4,520.00 RM 4,545.11
40

20
Example 5.7
The standard deviation of the time to failure of an electronic component is
estimated as 100 hours. Determine how large a sample of these
components must be in order to be 90% confident that the error in the
estimated time to failure will not exceed 20 hours.
𝑠
The confidence limits for the population mean is 𝑋 ± 𝑧
𝑛
We know 𝑠 = 100 and 𝑧 = 1.65 (from table)

The error not exceeding 20 hours means that 𝑋 ± 20


𝑠
𝑧 = 20
𝑛

41

100
1.65 = 20
𝑛

𝑛 = 8.225

𝑛 = 67.7 ≈ 𝟔𝟖

At least 68 components are required to be 90% confident that the


error will not exceed 20 hours.

21
Confidence Interval when Standard Deviation is Unknown
using the t-distribution
What do we do if the sample is less than 30 and the population standard
deviation is unknown?
Under these conditions, the correct statistical procedure is to replace the
standard normal distribution with the t distribution or sometimes called
the Student’s t distribution.
William Gosset published his work under the pen name of ‘Student’,
hence the name Student’s t distribution.
Gosset was concerned with the behaviour of the following term:
𝑋−𝜇
𝑡= 𝑠 where 𝑠 is an estimate of 𝜎.
𝑛
43

He was concerned that the discrepancy between 𝑠 and 𝜎 when 𝑠 was


calculated from a very small sample.

z distribution

t distribution

Note that the t distribution is flatter and more spread out because the
standard deviation of the t distribution is larger than that of the standard
normal distribution.
44

22
Characteristics of the t distribution
• It is a continuous distribution.
• It is bell-shaped and symmetrical.
• There is not one t distribution, but a family of t distributions.
• All t distributions have a mean of 0.
• Their standard deviations differ according to sample size 𝑛, i.e. the
larger the 𝑛, the larger the standard deviation.
• The t distribution is more spread out and flatter. However, as the
sample size increases, the t distribution approaches the standard
normal distribution.

The value of t for a given level of confidence is larger in magnitude than


the corresponding z value.

45

The confidence interval for the population mean using the t distribution
(with an unknown population standard deviation) is given by:
𝑠
𝑋±𝑡
𝑛

Therefore, we
1. Assume the samples are from a normal population.
2. Estimate the population standard deviation 𝜎 with the sample standard
deviation 𝑠.
3. Use the t distribution rather than the z distribution.

46

23
When do we use the z distribution or the t distribution?

No Is the population Yes


normal?
Is 𝑛 ≥ 30? Is the population
standard deviation
No Yes
known?
No Yes
Use Use the z
nonparametric distribution
test Use the t Use the z
distribution distribution

47

Example 5.8
A tire manufacturer wishes to investigate the tread life of its tires. A
sample of 10 tires driven 50,000 miles revealed a sample mean of 0.32
inch of tread remaining with a standard deviation of 0.09 inch.
(a) Construct a 95% confidence interval for the population mean.
(b) Would it be reasonable for the manufacturer to conclude that after
50,000 miles the population mean amount of tread remaining is 0.30
inches?

We assume the population is normally distributed. The population


standard deviation is not known but the sample standard deviation is
known (0.09 in)

48

24
(a)
Since 𝑛 = 10 (< 30), we cannot use the z-distribution. Hence, we use the
t-distribution:
𝑠
𝑋±𝑡
𝑛

t is determined using the t-distribution table. We need to calculate the


degree of freedom, df and level of significance, 𝛼 first.
The degree of freedom, df = 𝑛 – 1 = 10 – 1 = 9
Since the confidence level is 95%, the level of significance, 𝛼 = 0.05 (5%).
From the table (two-tailed test), t = 2.262

𝑠 0.09
95% 𝐶𝐼 = 𝑋 ± 𝑡 = 0.32 ± 2.262 = 0.32 ± 0.064 𝑖𝑛𝑐ℎ
𝑛 10
49

Level of significance for One-Tailed Test, 𝛼/2 Student’s t-


Degree of 0.10 0.05 0.025 0.01 0.005 0.001 0.0005 distribution
freedom,
Level of significance for Two-Tailed Test, 𝛼 table
df
0.20 0.10 0.05 0.02 0.01 0.002 0.001
The table gives the
1 3.078 6.314 12.076 31.821 63.657 318.310 636.620 values of t , df
2 1.886 2.920 4.303 6.965 9.925 22.326 31.598
3 1.638 2.353 3.182 4.541 5.841 10.213 12.924 where
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
P(Tdf > t , df ) = 
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
and t /2, df
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781 where
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437 P(Tdf > t /2, df ) = /2
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
50

25
(b)
The confidence limits are 0.256 𝑖𝑛 and 0.384 𝑖𝑛. It is therefore reasonable
to conclude that the population mean is in this interval.
The manufacturer can be reasonably sure, with 95% confidence, that the
mean remaining tread depth is between 0.256 𝑖𝑛 and 0.384 𝑖𝑛. Since the
value of 0.30 𝑖𝑛 is within this interval, it is possible that the mean of the
population is 0.30 𝑖𝑛.

51

Confidence Interval for a Proportion


We have previously used the ratio scale of measurement, such as
income, weight, distance and age. What if we use proportions (nominal
scale of measurement) such as percentages?
Proportion is the percent, fraction or ratio indicating the part of the sample
or the population having a particular trait of interest.
For instance, a recent survey indicated that 72 out of 100 people favoured
the new government. The sample proportion is therefore 72%.
If we let 𝑝 represent the sample proportion, 𝑋 the number of successes
and 𝑛 the number of items sampled, we can determine a sample
proportion as follows:
𝑋
𝑝=
𝑛
52

26
To develop a confidence interval for a proportion, we need to meet the
following assumptions:
1. The binomial conditions have been met.
2. The values 𝑛𝜋 > 5 and 𝑛(1 – 𝜋) > 5, where 𝜋 is the percent of
successes in the population. This allows us to invoke the central limit
theorem and employ the z distribution as a part of the confidence
interval.
The confidence interval for a population proportion can then be
determined using the following expression:

𝑝(1 − 𝑝)
𝑝±𝑧
𝑛

53

Example 5.9
Two engineering societies propose a merger. A random sample of 2,000
members will be asked if they support the merger. 1,600 members plan to
support the merger. According to bylaws, mergers must obtain at least
three-fourths of the members’ votes.
(a) What is the estimate of the population proportion?
(b) Construct a 95% confidence interval for the population proportion.
(c) Can it be concluded that the merger proposal will pass?

(a)
𝑋 1,600
𝑝= = = 0.8 𝑜𝑟 80%
𝑛 2,000

54

27
(b)
𝑝(1 − 𝑝) 0.80(1 − 0.80)
𝑝±𝑧 = 0.80 ± 1.96 = 0.80 ± 0.018
𝑛 2,000

(c) The confidence limits are 0.782 and 0.818. Since the lower
confidence limit is greater than 0.75 (required according to bylaws),
hence it can be concluded that the merger proposal will pass.

55

Finite Population Correction Factor (FPC)


The populations we have sampled so far have been very large or infinite.
What if the sampled population is not very large?
A population that has a fixed upper bound is finite. A finite population can
be rather small (e.g. students in a class), and can also be very large (e.g.
engineering students in Johor).
For a finite population, where the total number of objects is 𝑁 and the size
of the sample is 𝑛, the following adjustment is made to the standard errors
of the sample means and proportions. This adjustment is called the finite
population correction factor.

Standard error of the sample 𝜎 𝑁−𝑛


mean using the correction factor 𝜎 =
𝑛 𝑁−1
56

28
Standard error of the sample 𝑝(1 − 𝑝) 𝑁 − 𝑛
proportion using the correction 𝜎 =
𝑛 𝑁−1
factor

It should be noted that the correction factor is significant only when we


sample without replacement from more than 5% of a finite population,
𝒏/𝑵 > 𝟎. 𝟎𝟓.
It is needed because under these circumstances, the Central Limit
Theorem does not hold and the standard error of the estimate (the
mean or proportion) will be too big.
In basic terms, the correction factor captures the difference between
sampling with replacement and sampling without replacement.

57

Example 5.10
A university requested donations from 250 international alumni members.
A survey of 40 international alumni reveals the mean donation is $450
with a standard deviation of $75. Construct a 90% confidence interval for
the mean donation.

The population is finite because it has an upper bound of 250.


Also, 𝑛/𝑁 = 40/250 = 0.16  𝑛/𝑁 > 0.05
Hence, we use the finite population correction factor.
The z-value corresponding to 90% confidence interval is 1.65.

58

29
𝑠 𝑁−𝑛
𝑋±𝑧
𝑛 𝑁−1

$75 250 − 40
$450 ± 1.65 = $450 ± $17.97
40 250 − 1

The confidence limits are $432.03 and $467.97. It can be concluded with
90% confidence, that the population mean donation falls within this
interval.

59

30

You might also like