Lecture 4 - Ch4 - s2-5

IE 228
Engineering Statistics
Lecture 4
Binomial and Normal Distributions

(Sections 4.2 & 4.5)
4-2
Topics to learn
1. Errors in measuring process, uncertainty, bias
2. Binomial distribution
3. Sample proportion and success probability
4. Normal distribution
5. Standard units and standard normal distribution
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

4-3
Errors in measuring process,

uncertainty, bias

4-4
Introduction
• Any measuring procedure contains error.
• Thus, measured values generally differ
somewhat from the true values that are being
measured.
• The errors in the measurements produce errors
in calculated values (like the mean).
Definition: When error in measurement

produces error in calculated values, we say
that error is propagated from the
4-5
Measurement Error
• A geologist weighs a rock on a scale and gets
the following measurements:
251.3 252.5 250.8 251.1 250.4
• These measurements differ from one another,
and it is unlikely that any of them is equal to
the true mass of the rock.
• The error in the measured value is the
difference between a measured value and the
true value.

4-6
Parts of Error
• We think of the error of the measurement as being
composed of two parts:
– Systematic error (bias)
– Random error
• Bias is the part of the error that is the same for every
measurement.
For example, a scale that always gives you a reading that is too low.
• Random error is error that varies from measurement to

measurement and averages out to zero in the long run.

4-7
Parts of Error
• Any measurement can be considered to be the

sum of the true value plus contributions from
each of the components of error:
Measured value = true value + bias + random error

Two Aspects of the Measuring Process: 4-8
Accuracy and Precision

• We are interested in accuracy.
– Accuracy is determined by bias.
– The smaller the bias, the more accurate the measuring
process.
– If the bias is zero, the measuring process is said to be
unbiased.
• We are also interested in precision.
– Precision refers to the degree to which repeated
measurements of the same quantity tend to agree with
each other.
– If repeated measurements come out nearly the same
every time, the precision is high.

4-9
More on Error
• A measured value is a random variable with mean 
and standard deviation .
• The bias in the measuring process is the difference
between the mean measurement and the true value:
Bias =   true value
• The uncertainty in the measuring process is the
standard deviation .
Uncertainty = standard deviation 
• The smaller the bias, the more accurate the measuring
process.
• The smaller the uncertainty, the more precise the
measuring process.
4-10
More on Error
FIGURE 3.1
(a) Both bias and uncertainty are small.
(b) Bias is large; uncertainty is small.
(c) Bias is small; uncertainty is large.
(d) Both bias and uncertainty are large.
Note: We can estimate the uncertainty from the set of repeated

measurements, but without knowing the true value, we cannot
estimate the bias.
4-11
Binomial Distribution and

Normal Distribution

4-12
Section 4.2:
The Binomial Distribution
If a total of n Bernoulli trials are conducted, and
 The trials are independent.
 Each trial has the same success probability p
 X is the number of successes in the n trials
then X has the binomial distribution with

parameters n and p, denoted X ~ Bin(n, p).

In-class exercise 4-13
Example 4
A fair coin is tossed 10 times. Let X be the
number of heads that appear. What is the
Solution:
distribution of X ?

4-14

4-15
Another Use of the Binomial

 Assume that a finite population contains items
of two types, successes and failures, and that a
simple random sample is drawn from the
population.
 Then if the sample size is no more than 5% of
the population, the binomial distribution may
be used to model the number of successes. (*)
(*) Recall from the discussion of independence that, when

drawing a sample from a finite, tangible population, the sample
items may be treated as independent if the population is very
large compared to the size of the sample.
Example 5
A lot contains several thousand components,
10% of which are defective. Seven components
are sampled from the lot. Let X represent the
number of defective components in the sample.
Solution:
What is the distribution of X?

4-17

4-18
Binomial R.V.:
pmf
 If X ~ Bin(n, p), the probability mass
function (pmf) of X is
 n!
 p x
(1  p ) n x
, x  0,1,..., n
p ( x)  P ( X  x)   x !(n  x)!
0, otherwise


4-19

Solution:
We use the pmf Equation with n = 10 and p = 0.4.
The pmf is,

4-21
Solution (cont.):

Solution:

4-23
Solution (cont.):
Using the probability mass function,

4-24

Solution:

4-26

4-27

Solution:

4-29

4-30
Binomial R.V.:
mean, and variance
If X ~ Bin(n, p)
 Mean: X = np
 Variance:   np (1  p )
2
X

4-31
More on the Binomial

• Assume n independent Bernoulli trials are conducted.
• Each trial has probability of success p.
• Let Y1, …, Yn be defined as follows: Yi = 1 if the ith
trial results in success, and Yi = 0 otherwise. (Each of
the Yi has the Bernoulli(p) distribution.)
• Now, let X represent the number of successes among
the n trials. So, X = Y1 + …+ Yn .
 This shows that a binomial random variable can be

expressed as a sum of Bernoulli random variables.

4-32
Using a Sample Proportion to Estimate a Success

Probability
sample proportion:

Solution:

4-34

4-35
Uncertainty in the Sample Proportion
• It is important to realize that the sample
proportion is just an estimate of the success
probability p, and in general, is not equal to p.
• If another sample were taken, the value of
would probably come out differently. In other
words, there is uncertainty in .
• For to be useful, we must compute its bias
and its uncertainty.
Let n denote the sample size, and let X denote
the number of successes, where X ∼ Bin(n, p).
4-36
• The bias is the difference Since
, it follows (from Equation (2.41) (in Section 2.5))
that
• Since is unbiased; in other words,

its bias is 0.
4-37
• The uncertainty is the standard deviation .
• From Equation (4.6), the standard deviation of X is
• Since p = X/n, it follows (from Equation (2.43) in Section

2.5) that
• In practice, when computing the uncertainty in , we

don’t know the success probability p, so we approximate
it with .

4-38
Estimate of p
If X ~ Bin(n, p), then the sample proportion is
used to estimate the success probability p.
Note:
 Bias is the difference
 is unbiased.
 The uncertainty in is
In practice, when computing , we substitute for p,

since p is unknown.
Solution:

4-40
Solution (cont.):

4-41

Solution:

4-43

Section 4.5: 4-44
The Normal Distribution

 The normal distribution (also called the
Gaussian distribution) is by far the most
commonly used distribution in statistics.
 This distribution provides a good model for
many, although not all, continuous
populations.
 The normal distribution is continuous rather
than discrete.
 The mean of a normal population may have
any value, and the variance may have any
positive value.
Normal R.V.: 4-45
pdf, mean, and variance

The probability density function of a normal
population with mean  and variance 2 is given
by 1
f ( x)  e  ( x   ) / 2 ,    x  
2 2
 2
If X ~ N(, 2), then 

the
X
 and variance of X
mean
are given by
X 
2 2

68-95-99.7% Rule 4-46
This figure represents a plot of the normal probability density function

with mean  and standard deviation . Note that the curve is
symmetric about , so that  is the median as well as the mean. It is
also the case for the normal population.
 About 68% of the population is in the interval   .
 About 95% of the population is in the interval   2.
 About 99.7% of the population is in the interval   3.
Standard Units 4-47
• The proportion of a normal population that is

within a given number of standard deviations
of the mean is the same for any normal
population.
• For this reason, when dealing with normal

populations, we often convert from the units in
which the population items were originally
measured to standard units.
• Standard units tell how many standard

Solution:

4-49

4-50
Standard Normal Distribution
 In general, we convert to standard units by subtracting the
mean and dividing by the standard deviation.
 Thus, if x is an item sampled from a normal population
with mean  and variance  2, the standard unit
equivalent of x is the number z, where
x
z

 The number z is sometimes called the “z-score” of x.
 The z-score is an item sampled from a normal population
with mean 0 and standard deviation of 1.
 This normal population is called the standard normal
population.

Example 13
Aluminum sheets used to make beverage cans have thicknesses (in
thousandths of an inch) that are normally distributed with mean 10
and standard deviation 1.3.
A particular sheet is 10.8 thousandths of an inch thick. Find the z-
score.
Solution:

4-52

Example 13 cont.
The thickness of a certain sheet has a z-score of
1.7. Find the thickness of the sheet in the
original units of thousandths of inches.
Solution:

4-54

4-55
Finding Areas Under the Normal Curve
• The proportion of a normal population that lies

within a given interval is equal to the area
under the normal probability density above
that interval.
• This would suggest integrating the normal pdf,
but this integral does not have a closed form
solution.

4-56
Finding Areas Under the Normal Curve

• So, the areas under the standard normal curve
(mean 0, variance 1) are approximated
numerically and are available in a standard
normal table or z table, given in Table A.2.
• We can convert any normal into a standard

normal so that we can compute areas under the
curve.
• The table gives the area in the left-hand tail of

the curve. Other areas can be calculated by
subtraction or by using the fact that the total
area under the curve is 1.
4-57

Example 14
1) Find the area under normal curve to the left of z = 0.47.
Solution:

Example 14
2) Find the area under normal curve to the right of z = 1.38.
Solution:

Example 15
1) Find the area under the normal curve between
z = 0.71 and z = 1.28.
Solution:

Example 15
2) What z-score corresponds to the 75th percentile of a normal
curve?
Solution:

4-62

4-63

Solution:

4-65

4-66
Solution (cont.):
The following figure presents the probability density function of
the N(50, 52)
The shaded area represents P(42 < X < 52), the probability that a
randomly chosen battery has a lifetime between 42 and 52 hours.

Solution:

4-68

4-69
Solution (cont.):
Therefore, the 40th percentile of battery lifetimes is 48.75

4-70

4-71

4-72

4-73

4-74

4-75
Using 2.51 and its z-score 1.96:

4-76
Estimating the Parameters

 If X1,…, Xn are a random sample from a N(, 2)
distribution,
•  is estimated with the sample mean and
•  2 is estimated with the sample standard deviation.
 As with
X is  /anyn sample mean, the uncertainty
s / n in
which we replace with , if  is
unknown.
• The mean is an unbiased estimator of .
4-77
Linear Functions of Normal Random
Variables
Let X ~ N(, 2) and let a ≠ 0 and b be constants.
Then 2 2
𝑎𝑋 +𝑏 𝑁 (𝑎𝜇+ 𝑏 , 𝑎 𝜎 )
Linear Combinations
Let X1, X2, …, Xn be independent and normally distributed with
means 1, 2,…, n and variances .
Let c1, c2,…, cn be constants, and c1 X1 + c2 X2 +…+ cnXn be a

linear combination. Then

4-78
Example 16
A chemist measures the temperature of a
solution in oC. The measurement is denoted C,
and is normally distributed with mean 40 oC and
standard deviation 1oC. The measurement is
converted to oF by the equation F = 1.8C +
32. What is the distribution of F?

4-79

4-80

Distributions of Functions of 4-81
Normals
Let X1, X2, …, Xn be independent and normally distributed with
mean  and variance  2. Then
2
σ
Let X and Y be independent, with X ~ N(X, X ) and
2
Y ~ N(Y, σ Y ). Then

4-82
How Can I Tell Whether My Data Come from
a Normal Population?
• In practice, we often have a sample from some population, and
we must use the sample to decide whether the population
distribution is approximately normal.
• If the sample is reasonably large:
– the sample histogram may give a good indication
– histograms look something like the normal density function
(peaked in the center, and decreasing more or less
symmetrically on either side)
– Probability plots provide another good way of determining
whether a reasonably large sample comes from a
population that is approximately normal.

4-83
How Can I Tell Whether My Data Come from
a Normal Population?
• For small samples, it can be difficult to tell whether the normal
distribution is appropriate.
• One important fact is this: Samples from normal populations
rarely contain outliers. Therefore the normal distribution
should generally not be used for data sets that contain outliers.
• This is especially true when the sample size is small.
– Unfortunately, for small data sets that do not contain
outliers, it is difficult to determine whether the population
is approximately normal.
– In general, some knowledge of the process that generated
the data is needed.

End of Lecture

Lecture 4 - Ch4 - s2-5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4 - Ch4 - s2-5

Uploaded by

Copyright:

Available Formats

IE 228

Binomial and Normal Distributions

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Errors in measuring process,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Definition: When error in measurement

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

• Random error is error that varies from measurement to

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

• Any measurement can be considered to be the

Measured value = true value + bias + random error

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Accuracy and Precision

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Note: We can estimate the uncertainty from the set of repeated

Binomial Distribution and

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

then X has the binomial distribution with

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Another Use of the Binomial

(*) Recall from the discussion of independence that, when

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

The pmf is,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

More on the Binomial

 This shows that a binomial random variable can be

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

Using a Sample Proportion to Estimate a Success

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

• Since is unbiased; in other words,

• Since p = X/n, it follows (from Equation (2.43) in Section

• In practice, when computing the uncertainty in , we

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

In practice, when computing , we substitute for p,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

The Normal Distribution

pdf, mean, and variance

If X ~ N(, 2), then 

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.

This figure represents a plot of the normal probability density function

• The proportion of a normal population that is