You are on page 1of 84

IE 228

Engineering Statistics
Lecture 4

Binomial and Normal Distributions


(Sections 4.2 & 4.5)
4-2

Topics to learn
1. Errors in measuring process, uncertainty, bias
2. Binomial distribution
3. Sample proportion and success probability
4. Normal distribution
5. Standard units and standard normal distribution

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-3

Errors in measuring process,


uncertainty, bias

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-4

Introduction
• Any measuring procedure contains error.
• Thus, measured values generally differ
somewhat from the true values that are being
measured.
• The errors in the measurements produce errors
in calculated values (like the mean).

Definition: When error in measurement


produces error in calculated values, we say
that error is propagated from the
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-5
Measurement Error
• A geologist weighs a rock on a scale and gets
the following measurements:
251.3 252.5 250.8 251.1 250.4
• These measurements differ from one another,
and it is unlikely that any of them is equal to
the true mass of the rock.
• The error in the measured value is the
difference between a measured value and the
true value.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-6
Parts of Error
• We think of the error of the measurement as being
composed of two parts:
– Systematic error (bias)
– Random error
• Bias is the part of the error that is the same for every
measurement.
For example, a scale that always gives you a reading that is too low.

• Random error is error that varies from measurement to


measurement and averages out to zero in the long run.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-7
Parts of Error

• Any measurement can be considered to be the


sum of the true value plus contributions from
each of the components of error:

Measured value = true value + bias + random error

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Two Aspects of the Measuring Process: 4-8

Accuracy and Precision


• We are interested in accuracy.
– Accuracy is determined by bias.
– The smaller the bias, the more accurate the measuring
process.
– If the bias is zero, the measuring process is said to be
unbiased.
• We are also interested in precision.
– Precision refers to the degree to which repeated
measurements of the same quantity tend to agree with
each other.
– If repeated measurements come out nearly the same
every time, the precision is high.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-9
More on Error
• A measured value is a random variable with mean 
and standard deviation .
• The bias in the measuring process is the difference
between the mean measurement and the true value:
Bias =   true value
• The uncertainty in the measuring process is the
standard deviation .
Uncertainty = standard deviation 
• The smaller the bias, the more accurate the measuring
process.
• The smaller the uncertainty, the more precise the
measuring process.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-10
More on Error

FIGURE 3.1
(a) Both bias and uncertainty are small.
(b) Bias is large; uncertainty is small.
(c) Bias is small; uncertainty is large.
(d) Both bias and uncertainty are large.

Note: We can estimate the uncertainty from the set of repeated


measurements, but without knowing the true value, we cannot
estimate the bias.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-11

Binomial Distribution and


Normal Distribution

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-12
Section 4.2:
The Binomial Distribution
If a total of n Bernoulli trials are conducted, and
 The trials are independent.
 Each trial has the same success probability p
 X is the number of successes in the n trials

then X has the binomial distribution with


parameters n and p, denoted X ~ Bin(n, p).

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-13
Example 4
A fair coin is tossed 10 times. Let X be the
number of heads that appear. What is the
Solution:
distribution of X ?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-14

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-15

Another Use of the Binomial


 Assume that a finite population contains items
of two types, successes and failures, and that a
simple random sample is drawn from the
population.
 Then if the sample size is no more than 5% of
the population, the binomial distribution may
be used to model the number of successes. (*)

(*) Recall from the discussion of independence that, when


drawing a sample from a finite, tangible population, the sample
items may be treated as independent if the population is very
large compared to the size of the sample.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-16

Example 5
A lot contains several thousand components,
10% of which are defective. Seven components
are sampled from the lot. Let X represent the
number of defective components in the sample.
Solution:
What is the distribution of X?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-17

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-18
Binomial R.V.:
pmf
 If X ~ Bin(n, p), the probability mass
function (pmf) of X is
 n!
 p x
(1  p ) n x
, x  0,1,..., n
p ( x)  P ( X  x)   x !(n  x)!
0, otherwise

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-19

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-20

Solution:
We use the pmf Equation with n = 10 and p = 0.4.

The pmf is,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-21
Solution (cont.):

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-22

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-23

Solution (cont.):
Using the probability mass function,

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-24

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-25

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-26

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-27

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-28

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-29

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-30
Binomial R.V.:
mean, and variance
If X ~ Bin(n, p)

 Mean: X = np

 Variance:   np (1  p )
2
X

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-31

More on the Binomial


• Assume n independent Bernoulli trials are conducted.
• Each trial has probability of success p.
• Let Y1, …, Yn be defined as follows: Yi = 1 if the ith
trial results in success, and Yi = 0 otherwise. (Each of
the Yi has the Bernoulli(p) distribution.)
• Now, let X represent the number of successes among
the n trials. So, X = Y1 + …+ Yn .

 This shows that a binomial random variable can be


expressed as a sum of Bernoulli random variables.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-32

Using a Sample Proportion to Estimate a Success


Probability

sample proportion:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-33

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-34

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-35
Uncertainty in the Sample Proportion
• It is important to realize that the sample
proportion is just an estimate of the success
probability p, and in general, is not equal to p.
• If another sample were taken, the value of
would probably come out differently. In other
words, there is uncertainty in .
• For to be useful, we must compute its bias
and its uncertainty.
Let n denote the sample size, and let X denote
the number of successes, where X ∼ Bin(n, p).
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-36
Uncertainty in the Sample Proportion
• The bias is the difference Since
, it follows (from Equation (2.41) (in Section 2.5))
that

• Since is unbiased; in other words,


its bias is 0.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-37
Uncertainty in the Sample Proportion
• The uncertainty is the standard deviation .
• From Equation (4.6), the standard deviation of X is

• Since p = X/n, it follows (from Equation (2.43) in Section


2.5) that

• In practice, when computing the uncertainty in , we


don’t know the success probability p, so we approximate
it with .

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-38
Estimate of p
If X ~ Bin(n, p), then the sample proportion is
used to estimate the success probability p.
Note:
 Bias is the difference
 is unbiased.
 The uncertainty in is

In practice, when computing , we substitute for p,


since p is unknown.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-39

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-40
Solution (cont.):

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-41

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-42

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-43

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Section 4.5: 4-44

The Normal Distribution


 The normal distribution (also called the
Gaussian distribution) is by far the most
commonly used distribution in statistics.
 This distribution provides a good model for
many, although not all, continuous
populations.
 The normal distribution is continuous rather
than discrete.
 The mean of a normal population may have
any value, and the variance may have any
positive value.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Normal R.V.: 4-45

pdf, mean, and variance


The probability density function of a normal
population with mean  and variance 2 is given
by 1
f ( x)  e  ( x   ) / 2 ,    x  
2 2

 2

If X ~ N(, 2), then 


the
X
 and variance of X
mean
are given by
X 
2 2

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


68-95-99.7% Rule 4-46

This figure represents a plot of the normal probability density function


with mean  and standard deviation . Note that the curve is
symmetric about , so that  is the median as well as the mean. It is
also the case for the normal population.
 About 68% of the population is in the interval   .
 About 95% of the population is in the interval   2.
 About 99.7% of the population is in the interval   3.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
Standard Units 4-47

• The proportion of a normal population that is


within a given number of standard deviations
of the mean is the same for any normal
population.

• For this reason, when dealing with normal


populations, we often convert from the units in
which the population items were originally
measured to standard units.

• Standard units tell how many standard


McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
In-class exercise 4-48

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-49

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-50
Standard Normal Distribution
 In general, we convert to standard units by subtracting the
mean and dividing by the standard deviation.
 Thus, if x is an item sampled from a normal population
with mean  and variance  2, the standard unit
equivalent of x is the number z, where
x
z

 The number z is sometimes called the “z-score” of x.
 The z-score is an item sampled from a normal population
with mean 0 and standard deviation of 1.
 This normal population is called the standard normal
population.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-51
Example 13
Aluminum sheets used to make beverage cans have thicknesses (in
thousandths of an inch) that are normally distributed with mean 10
and standard deviation 1.3.
A particular sheet is 10.8 thousandths of an inch thick. Find the z-
score.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-52

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-53
Example 13 cont.
The thickness of a certain sheet has a z-score of
1.7. Find the thickness of the sheet in the
original units of thousandths of inches.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-54

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-55

Finding Areas Under the Normal Curve

• The proportion of a normal population that lies


within a given interval is equal to the area
under the normal probability density above
that interval.
• This would suggest integrating the normal pdf,
but this integral does not have a closed form
solution.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-56

Finding Areas Under the Normal Curve


• So, the areas under the standard normal curve
(mean 0, variance 1) are approximated
numerically and are available in a standard
normal table or z table, given in Table A.2.

• We can convert any normal into a standard


normal so that we can compute areas under the
curve.

• The table gives the area in the left-hand tail of


the curve. Other areas can be calculated by
subtraction or by using the fact that the total
area under the curve is 1.
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-57

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-58

Example 14
1) Find the area under normal curve to the left of z = 0.47.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-59

Example 14
2) Find the area under normal curve to the right of z = 1.38.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-60
Example 15
1) Find the area under the normal curve between
z = 0.71 and z = 1.28.
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-61
Example 15
2) What z-score corresponds to the 75th percentile of a normal
curve?
Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-62

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-63

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-64

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-65

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-66
Solution (cont.):
The following figure presents the probability density function of
the N(50, 52)

The shaded area represents P(42 < X < 52), the probability that a
randomly chosen battery has a lifetime between 42 and 52 hours.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


In-class exercise 4-67

Solution:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-68

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-69
Solution (cont.):
Therefore, the 40th percentile of battery lifetimes is 48.75

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-70

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-71

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-72

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-73

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-74

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-75

Using 2.51 and its z-score 1.96:

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-76

Estimating the Parameters


 If X1,…, Xn are a random sample from a N(, 2)
distribution,
•  is estimated with the sample mean and
•  2 is estimated with the sample standard deviation.

 As with
X is  /anyn sample mean, the uncertainty
s / n in
which we replace with , if  is
unknown.
• The mean is an unbiased estimator of .
McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.
4-77
Linear Functions of Normal Random
Variables
Let X ~ N(, 2) and let a ≠ 0 and b be constants.
Then 2 2
𝑎𝑋 +𝑏 𝑁 (𝑎𝜇+ 𝑏 , 𝑎 𝜎 )
Linear Combinations
Let X1, X2, …, Xn be independent and normally distributed with
means 1, 2,…, n and variances .

Let c1, c2,…, cn be constants, and c1 X1 + c2 X2 +…+ cnXn be a


linear combination. Then

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-78
Example 16
A chemist measures the temperature of a
solution in oC. The measurement is denoted C,
and is normally distributed with mean 40 oC and
standard deviation 1oC. The measurement is
converted to oF by the equation F = 1.8C +
32. What is the distribution of F?

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-79

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-80

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


Distributions of Functions of 4-81

Normals
Let X1, X2, …, Xn be independent and normally distributed with
mean  and variance  2. Then

2
σ
Let X and Y be independent, with X ~ N(X, X ) and
2
Y ~ N(Y, σ Y ). Then

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-82
How Can I Tell Whether My Data Come from
a Normal Population?
• In practice, we often have a sample from some population, and
we must use the sample to decide whether the population
distribution is approximately normal.
• If the sample is reasonably large:
– the sample histogram may give a good indication
– histograms look something like the normal density function
(peaked in the center, and decreasing more or less
symmetrically on either side)
– Probability plots provide another good way of determining
whether a reasonably large sample comes from a
population that is approximately normal.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


4-83
How Can I Tell Whether My Data Come from
a Normal Population?
• For small samples, it can be difficult to tell whether the normal
distribution is appropriate.
• One important fact is this: Samples from normal populations
rarely contain outliers. Therefore the normal distribution
should generally not be used for data sets that contain outliers.
• This is especially true when the sample size is small.
– Unfortunately, for small data sets that do not contain
outliers, it is difficult to determine whether the population
is approximately normal.
– In general, some knowledge of the process that generated
the data is needed.

McGraw-Hill ©2014 by The McGraw-Hill Companies, Inc. All rights reserved.


End of Lecture

You might also like