You are on page 1of 66

The Normal Distribution

The Normal Distribution


 Usingg Statistics
 Properties of the Normal Distribution
 The Standard Normal Distribution
 Th Transformation
The T f ti off Normal
N l Random
R d Variables
V i bl
 The Inverse Transformation
 Thee Normal
No Approximation
pp o o oof Binomial
o Distributions
s bu o s
LEARNING OBJECTIVES

After studying this chapter


chapter, you should be able to:
 Identify when a random variable will be normally
distributed
 Use the properties of normal distributions

 Explain the significance of the standard normal distribution

 Compute probabilities using normal distribution tables

 Transform a normal distribution into a standard normal


distribution
 Convert a binomial distribution into an approximate normal
di ib i
distribution
Introduction
As n increases, the binomial distribution approaches a ...
n=6 n = 10 n = 14
Binomial Distribution: n=6, p=.5 Binomial Distribution: n=10, p=.5 Binomial Distribution: n=14, p=.5

0.3 0.3 0.3

0.2 0.2 0.2


P(x)

P(x)
P(x)

0.1 0.1 0.1

0.0 0.0 0.0


0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x x x

Normal Probability Density Function: Normal Distribution:  = 0, = 1


0.4

x  2
 


  0.3
 
 

e 2 2

f(x)
f ( x)    x
0.2
1 for 0.1

2 2 0.0

where e  2 . 7182818 ... and   3 . 14159265 ...


-5 0 5
x
The Normal Probability
y Distribution

The normal probability density function: Normal Distribution:  = 0, = 1

0.4


x  2


  0.3
 
 

f (x)  1 e 2  2 for  x

f(x)
0.2

2  2 0.1

where e  2 .7182818 ... and   3.14159265 ...


0.0
-5 0 5
x
Properties
p of the Normal Distribution

• The normal is a family of


Bell-shaped and symmetric distributions.
Because the distribution is symmetric, one-half (.50 or 50%) lies on
either side of the mean.
Each is characterized by a different pair of mean, , and variance,
. That is: [X~N()].
Each is asymptotic to the horizontal axis.
axis
The area under any normal probability density function within k
of  is the same for any normal distribution, regardless of the mean
and variance.
Properties of the Normal Distribution
(continued)

• If several independent random variables are normally distributed


then their sum will also be normally distributed.
• The mean of the sum will be the sum of all the individual means.
• The variance
i off the sum will
i be the sum off all the individual
i i i
variances (by virtue of the independence).
Properties of the Normal Distribution
(continued)

• If X1, X2, …, Xn are independent normal random variable, then


their sum S will also be normally distributed with
• E(S) = E(X1) + E(X2) + … + E(Xn)
• V(S) = V(X1) + V(X2) + … + V(Xn)
• Note: It is the variances that can be added above and not the
standard deviations.
4-9

Properties of the Normal Distribution


– Example

Example 4.1: Let X1, X2, and X3 be independent random variables that are
normally distributed with means and variances as shown.

Mean Variance
X1 10 1
X2 20 2
X3 30 3

Let S = X1 + X2 + X3. Then E(S) = 10 + 20 + 30 = 60 and


V(S) = 1 + 2 + 3 = 6. The standard deviation of S is 6
= 2.45.
4-10

Properties of the Normal Distribution


(continued)

• If X1, X2, …, Xn are independent normal random variable, then the


random variable Q defined as Q = a1X1 + a2X2 + … + anXn + b will
also be normally distributed with
• E(Q) = a1E(X1) + a2E(X2) + … + anE(Xn) + b
• V(Q) = a12 V(X1) + a22 V(X2) + … + an2 V(Xn)
• Note: It is the variances that can be added above and not the
standard deviations
deviations.
Properties of the Normal Distribution
– Example

Example 4.3: Let X1 , X2 , X3 and X4 be independent random variables that are normally
distributed with means and variances as shown. Find the mean and variance of Q =
X1 - 2X2 + 3X2 - 4X4 + 5

Mean Variance
X1 12 4
X2 -5 2
X3 8 5
X4 10 1

E(Q) = 12 – 2(-5) + 3(8) – 4(10) + 5 = 11


V(Q) = 4 + (-2)2(2) + 32(5) + (-4)2(1) = 73
SD(Q) = 73  8.544
Normal Probability
y Distributions
All of these are normal probability density functions, though each has a different mean and variance.
Normal Distribution:  =40, =1 Normal Distribution:  =30, =5 Normal Distribution:  =50, =3
0.4 0.2 0.2

0.3
f(w)

ff(x)

ff(y)
02
0.2 01
0.1 01
0.1
f(

0.1

0.0 0.0 0.0


35 40 45 0 10 20 30 40 50 60 35 45 50 55 65
w x y

W~N(40,1) X~N(30,25) Y~N(50,9)


Normal Distribution:  =0, =1
0.4 Consider:
03
0.3
The pprobability
y in each
P(39  W  41) case is an area under a
f(z)

0.2

0.1 P(25  X  35) normal probability density


0.0 P(47  Y  53) function.
P( 1  Z  1)
-5
5 0 5
z P(-1
Z~N(0,1)
The Standard Normal Distribution

The standard
Th t d d normall random d variable,
i bl Z,Z is
i the
th normall random
d
variable with mean  = 0 and standard deviation  = 1: Z~N(0,12).

Standard Normal Distribution

0 .4

0 .3

=1
f(z)

{
0 .2

0 .1

0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5

=0
Z
Finding Probabilities of the Standard
Normal Distribution: P(0 §Z § 1.56)
1 56)
Standard Normal Probabilities
Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)

0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

Look in row labeled 1.5 2.1


2.2
0.4821
0.4861
0.4826
0.4864
0.4830
0.4868
0.4834
0.4871
0.4838
0.4875
0.4842
0.4878
0.4846
0.4881
0.4850
0.4884
0.4854
0.4887
0.4857
0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
and column labeled .06 to 2.4
2.5
0.4918
0.4938
0.4920
0.4940
0.4922
0.4941
0.4925
0.4943
0.4927
0.4945
0.4929
0.4946
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
0.4936
0.4952

find P(0  z  1.56) = 2.6


2.7
0.4953
0.4965
0.4955
0.4966
0.4956
0.4967
0.4957
0.4968
0.4959
0.4969
0.4960
0.4970
0.4961
0.4971
0.4962
0.4972
0.4963
0.4973
0.4964
0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
0.4406 2.9
3.0
0.4981
0.4987
0.4982
0.4987
0.4982
0.4987
0.4983
0.4988
0.4984
0.4988
0.4984
0.4989
0.4985
0.4989
0.4985
0.4989
0.4986
0.4990
0.4986
0.4990
Finding Probabilities of the Standard
Normal Distribution: P(Z < -2.47)
2 47)
To find P(Z<-2.47):
P(Z<-2 47): z ... .06 .07 .08
. . . .
Find table area for 2.47 . . . .
P(0 < Z < 2.47) = .4932 . . . .
2.3 ... 0.4909 0.4911 0.4913
P(Z
( < -2.47)) = .5 - P(0
( < Z < 2.47)) 2 4 ...
2.4 0 4931
0.4931 0 4932
0.4932 0 4934
0.4934
= .5 - .4932 = 0.0068 2.5 ... 0.4948 0.4949 0.4951
.
.
.
Standard Normal Distribution
Area to the left of -2.47
0.4
P(Z < -2.47) = .5 - 0.4932
= 0.0068 0.3 Table area for 2.47
P(0 < Z < 2.47) = 0.4932
f(z)

0.2

0.1

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Probabilities of the Standard
Normal Distribution: P(1 § Z § 2)
To find P(1  Z  2): z
.
.00
.
...

. .
1. Find table area for 2.00 .
0.9
.
0.3159 ...

F(2) = P(Z  2.00) = .5 + .4772 =.9772


1.0 0.3413 ...
1.1 0.3643 ...
. .
. .
2 Find table area for 1.00
2. 1 00 . .
1.9 0.4713 ...
F(1) = P(Z  1.00) = .5 + .3413 = .8413 2.0
2.1
0.4772
0.4821
...
...

3. P(1  Z  2.00) = P(Z  2.00) - P(Z  1.00)


. .
. .
. .

= .9772
9772 - .8413
8413 = 0.1359
0 1359
Standard Normal Distribution
0.4

0.3
Area
ea between
be wee 1 anda d2
P(1  Z  2) = .9772 - .8413 = 0.1359
f(z)

0.2

0.1

00
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Values of the Standard Normal
Random Variable: P(0 § Z § z) = 0
0.40
40
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
To find z such that 0.0
01
0.1
0.0000
0 0398
0.0398
0.0040
0 0438
0.0438
0.0080
0 0478
0.0478
0.0120
0 0517
0.0517
0.0160
0 0557
0.0557
0.0199
0 0596
0.0596
0.0239
0 0636
0.0636
0.0279
0 0675
0.0675
0.0319
0 0714
0.0714
0.0359
0 0753
0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

P(0  Z  z) = .40: 0.4


0.5
0.1554
0.1915
0.1591
0.1950
0.1628
0.1985
0.1664
0.2019
0.1700
0.2054
0.1736
0.2088
0.1772
0.2123
0.1808
0.2157
0.1844
0.2190
0.1879
0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
1 Find a probability as close as
1. 09
0.9
1.0
0 3159
0.3159
0.3413
0 3186
0.3186
0.3438
0 3212
0.3212
0.3461
0 3238
0.3238
0.3485
0 3264
0.3264
0.3508
0 3289
0.3289
0.3531
0 3315
0.3315
0.3554
0 3340
0.3340
0.3577
0 3365
0.3365
0.3599
0 3389
0.3389
0.3621
possible to .40 in the table of 1.1
1.2
0.3643
0.3849
0.3665
0.3869
0.3686
0.3888
0.3708
0.3907
0.3729
0.3925
0.3749
0.3944
0.3770
0.3962
0.3790
0.3980
0.3810
0.3997
0.3830
0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
standard normal probabilities. . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .

2. Then determine the value of z Standard Normal Distribution


from the corresponding row
0.4
and column. Area to the left of 0 = .50 Area = .40 (.3997)
(  0)) = .50
P(z 0.3

P(0  Z  1.28)  .40


f(z)

0.2

Also, since P(Z  0) = .50


0.1

0.0

P(Z  1.28)  .90


-5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z = 1.28
99% Interval around the Mean

To have .99 in the center of the distribution, there


z .04
04 .05
05 .06
06 .07
07 .08
08 .09
09
should be (1/2)(1-.99) = (1/2)(.01) = .005 in each . . . . . . .
. . . . . . .
tail of the distribution, and (1/2)(.99) = .495 in . . . . . . .
2.4 ... 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
each half of the .99 interval. That is: 2.5 ... 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 ... 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
. . . . . . .

P(0  Z  z.005) = .495


. . . . . . .
. . . . . . .

Look to the table of standard normal probabilities Total area in center = .99
to find that: Area in center left = .495
0.4

 z.005   0.3


Area in center right = .495

z.005  

f(z)
0.2

f
P(-.2575 Z  ) = .99 Area in left tail = .005
0.1
Area in right tail = .005

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
-z.005 z.005
-2.575 2.575
The Transformation of Normal
Random Variables
The area within k of the mean is the same for all normal random variables. So an area
under any normal distribution is equivalent to an area under the standard normal. In this
example: P(40  X  P(-1  Z     since and 

The transformation of X to Z:
X   x Normal Distribution:  =50, =10
Z 

x 0.07
0.06

Transformation 0.05

f(x)
0.04
(1) Subtraction: (X - x) 0.03
0.02 =10

{
Standard Normal Distribution 0.01

0.4 0.00
0 10 20 30 40 50 60 70 80 90 100
X
0.3
f(z)

0.2

(2) Division by x)


{

0.1 1.0 The inverse transformation of Z to X:


0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 X  x  Z x
Z
Example: Using the Normal Transformation

X~N(160,302)

P (100  X  180 )
 100   X   180   
 P   
    
 100  160 180  160 
 P  Z  
 
30 30
 P 2  Z  .6666 )
 0 .4772  0 . 2475  0 . 7247
Using the Normal Transformation

Example

X~N(127,222)
P( X  150)

 P
 X   150   
 
   

 P Z 
150  127

 22 

 P Z  1.045
 0.5  0.3520  0.8520
The Transformation of Normal
Random Variables

The transformation of X to Z: The inverse transformation of Z to X:


X  x
Z  X    Z
x x x

The transformation of X to Z, where a and b are numbers::


 a  
 
P ( X a ) P Z  
  
 b  
 
P ( X b ) P Z  
  
a b  
 
P (a X b) P    Z  
   
Normal Probabilities ((Empirical
p Rule))

• The pprobabilityy that a normal random S tan d a rd N o rm al D is trib u tio n

variable will be within 1 standard 0 .4

deviation from its mean (on either 0 .3

side) is 0.6826, or approximately 0.68.

f(z)
0 .2

• The probability that a normal random


0 .1
variable will be within 2 standard
deviations from its mean is 0.9544,
0 9544 or 0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5
approximately 0.95. Z

• The probability that a normal random


variable
i bl will
ill b
be within
i hi 3 standard
d d
deviation from its mean is 0.9974.
The Inverse Transformation
The area within k of the mean is the same for all normal random variables. To find a
probability
b bili associated
i d withi h any interval
i l off values
l for
f any normall random
d variable,
i bl all
ll that
h
is needed is to express the interval in terms of numbers of standard deviations from the
mean. That is the purpose of the standard normal transformation. If X~N(50,102),
 x   70     70  50 
P( X  70)  P    P Z    P( Z  2)
     10 

That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean
of X: 70 =  + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal
distribution
distribution.

Example 4-12 X~N(124,122) Normal Distribution:  = 124,  = 12


P(X > x) = 0.10 and P(Z > 1.28) 0.10 0.04
x =  + z = 124 + (1.28)(12)
(1 28)(12) = 139.36
139 36
0.03
z .07 .08 .09
. . . . . f(x)
. . . . . 0.02
. . . . .
1.1 ... 0.3790 0.3810 0.3830 0.01
12
1.2 ... 0 3980
0.3980 0 3997
0.3997 0 4015
0.4015 0.01
1.3 ... 0.4147 0.4162 0.4177
. . . . .
. . . . . 0.00
. . . . . 80 130 180
139.36
X
The Inverse Transformation (Continued)

Example X~N(2450,4002)
Example N(5.7,0.52)
X~N(5.7,0.5
X P(a<X<b)=0.95 P( 1 96<Z<1 96)0.95
P(a<X<b)=0 95 and P(-1.96<Z<1.96) 0 95
P(X > x)=0.01 and P(Z > 2.33) 0.01 x =   z = 2450 ± (1.96)(400) = 2450
x =  + z = 5.7 + (2.33)(0.5) = 6.865 ±784=(1666,3234)
P(1666 < X < 3234) = 0.95
z .02 .03 .04
. . . . .
z .05 .06 .07
. . . . .
. . . . .
. . . . .
. . . . .
2.2 ... 0.4868 0.4871 0.4875
. . . . .
2.3 ... 0.4898 0.4901 0.4904
1.8 ... 0.4678 0.4686 0.4693
2.4 ... 0.4922 0.4925 0.4927
1.9 ... 0.4744 0.4750 0.4756
. . . . .
2.0 ... 0.4798 0.4803 0.4808
. . . . .
. . . . .
. . . . .
. . . . .

Normal Distribution:  = 5.7  = 0.5 Normal Distribution:  = 2450  = 400


0.8
0.0015
Area = 0.49
0.7
0.6 .4750 .4750
0.5 0 0010
0.0010
f(x)

f(x)
0.4
0.3 X.01 = +z = 5.7 + (2.33)(0.5) = 6.865
0.0005
0.2 .0250 .0250
0.1 Area = 0.01
0.0 0.0000
3.2 4.2 5.2 6.2 7.2 8.2 1000 2000 3000 4000
X X
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
z Z.01 = 2.33 -1.96 Z 1.96
4-26

Finding Values of a Normal Random


Variable Given a Probability
Variable,
1 Draw pictures of
1. Normal Distribution:  = 2450,
2450  = 400

the normal 0.0012


.

0.0010
.
distribution in 0.0008
.
question and of the

f(x)
0.0006
.

standard normal 0.0004


.

0.0002
.
distribution.
0.0000
1000 2000 3000 4000
X

S tand ard Norm al D is trib utio n


0.4

0.3
f(z)

0.2

0.1

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
4-27

Finding Values of a Normal Random


Variable Given a Probability
Variable,
Normal Distribution:  = 2450,  = 400
0.0012
.

0.0010
. .4750 .4750
1. Draw pictures of 0.0008
.
the normal

f(x)
x)
0.0006
.

distribution in 0.0004
.

0.0002
. .9500
question and of the
0.0000
standard normal 1000 2000 3000 4000
X
distribution.
S tand ard Norm al D is trib utio n
0.4
.4750
2. Shade the area 0.3
.4750

corresponding to
f(z)

0.2
the desired
probability. 0.1
.9500
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Values of a Normal Random
Variable Given a Probability
Variable,
Normal Distribution:  = 2450,  = 400
1 D
1. Draw pictures
i t off 3. From the table
0.0012
.
the normal 0.0010
. .4750 .4750 of the standard
distribution in 0.0008
.
normal

f(x)
question and of the
q 0.0006
.
distribution
distribution,
0.0004
.
standard normal 0.0002
. find the z value
.9500
distribution. 0.0000 or values.
1000 2000 3000 4000
X
2 Shade the area
2. S tand ard Norm al D is trib utio n
corresponding 0.4
to the desired .4750 .4750
0.3
probability.
f(z)

0.2

z .05 .06 .07


. . . . . 0.1
. . . . . .9500
. . . . .
1.8 ... 0.4678 0.4686 0.4693 0.0
1.9 ... 0.4744 0.4750 0.4756 -5 -4 -3 -2 -1 0 1 2 3 4 5
2.0 ... 0.4798 0.4803 0.4808 Z
. . . . .
. . . . .
-1.96 1.96
Finding Values of a Normal Random
Variable Given a Probability
Variable,
Normal Distribution:  = 2450,  = 400
1. Draw p pictures of
0.0012
. 3 F
3. From the
th ttable
bl
the normal .4750 .4750
0.0010
.
of the standard
distribution in 0.0008
.
normal

f(x)
question and of the 0.0006
.

0.0004
. distribution,,
standard normal 0.0002
. .9500 find the z value
distribution. 0.0000
1000 2000 3000 4000 or values.
X
2. Shade the area S tand ard Norm al D is trib utio n 4. Use the
corresponding 0.4
transformation
to the desired .4750 .4750 from z to x to get
0.3
probability. value(s) of the
original
g random
f(z)

0.2

z .05 .06 .07


0.1
variable.
. . . . .
. . . . . .9500
. . . . .
0.0
x =   z = 2450 ± (1.96)(400)
1.8 ... 0.4678 0.4686 0.4693
1.9 ... 0.4744 0.4750 0.4756 -5 -4 -3 -2 -1 0 1 2 3 4 5
2.0 ... 0.4798 0.4803 0.4808 Z = 2450 ±784=(1666,3234)
( , )
. . . . .
. . . . .
-1.96 1.96
Finding Values of a Normal Random
Variable Given a Probability
Variable,
The normal distribution with  = 33.55 and  = 1.323
1 323 is a close
approximation to the binomial with n = 7 and p = 0.50.

P( <4 5) = 0.7749
P(x<4.5) 0 7749 N
Normal Di t ib ti  = 3.5,
l Distribution: 3 5  = 1.323
1 323 Bi
Binomial
i l Distribution:
Di t ib ti n = 7,
7 p = 0.50
0 50

0.3 0.3

P( x 4) = 0.7734
0.2 0.2

P(x)
f(x)
f(

P
0.1 0.1

0.0 0.0
0 5 10 0 1 2 3 4 5 6 7
X X

MTB > cdf 4.5; MTB > cdf 4;


SUBC> normal 3.5 1.323. SUBC> binomial 7,.5.
Cumulative Distribution Function Cumulative Distribution Function

Normal with mean = 3.50000 and standard deviation = 1.32300 Bi


Binomial
i l with
ith n = 7 andd p = 0.500000
0 500000

x P( X <= x) x P( X <= x)
4.5000 0.7751 4.00 0.7734
The Normal Approximation of Binomial
Distribution

The normall distribution


Th di ib i with i h  = 5.5
5 5 andd  = 1.6583
1 6583 is
i a closer
l
approximation to the binomial with n = 11 and p = 0.50.
P(x
( < 4.5)) = 0.2732
Binomial Distribution: n = 11, p = 0.50
Normal Distribution:  = 5.5,  = 1.6583
P(x  4) = 0.2744
0.3
0.2

0.2

P(xx)
f(x)

0.1

0.1

0.0
0.0
0 1 2 3 4 5 6 7 8 9 10 11
0 5 10
X
X
Approximating a Binomial Probability
Using the Normal Distribution
 a  np b  np 
P ( a  X  b)  P Z 
 np(1  p) np(1 p) 

ffor n large
l (  50) andd p nott too
(n t close
l to
t 0 or 1.00
1 00
or:
 a  0.5  np b  0.5  np 
P (a  X  b)  P Z 
 np(1  p) 
np(1 p) 

for n moderately large (20  n < 50).


50)

NOTE: If p is either small ((close to 0)) or large


g ((close to 1),
), use the
Poisson approximation.
Confidence interval Using
g Statistics

• Consider the following statements:


x = 550
• A single-valued
single valued estimate that conveys little information
about the actual value of the population mean.
We are 99% confident that  is in the interval [449,551]
• An interval estimate which locates the population mean
within a narrow interval, with a high level of confidence.
We are 90% confident that  is in the interval [[400,700]
, ]
• An interval estimate which locates the population mean
within a broader interval, with a lower level of confidence.
Confidence Interval or Interval
Estimate

A confidence interval or interval estimate is a range or interval of


numbers believed to include an unknown population parameter.
Associated with the interval is a measure of the confidence we have
that the interval does indeed contain the parameter of interest.

• A confidence interval or interval estimate has two components:


A range or interval of values
An associated level of confidence
Confidence Interval for 
When  Is Known
 If the population distribution is normal, the sampling distribution of the mean is
normal
normal.
• If the sample is sufficiently large, regardless of the shape of the population
distribution, the sampling distribution is normal (Central Limit Theorem).
In either case:

   
P   196  x    196
Standard Normal Distribution: 95% Interval
. .   0.95
 n n 0.4

0.3

f(z)
or 0.2

0.1

  
0.0

P x  196    x  196   0.95
-4 -3 -2 -1 0 1 2 3 4
. . z
 n n
Confidence Interval for  when  is Known
(Continued)
Beforesampling,thereis a 0.95probability thatthe interval

  1.96
n
will includethe sample
p mean ((and 5% that it willnot).
)

Conversely, after sampling,approximately 95% of such intervals



x  1.96
n
will includethe populationmean (and 5% of them will not).


That is, x  1.96 is a 95%confidenceintervalfor  .
n
A 95% Interval around the Population
Mean
Sampling Distribution of the Mean
0.4
Approximately
A i t l 95% off sample l means
0.3
95%
can be expected to fall within the
interval    1.96  ,   1.96  .
f(x)

0.2
 n n 
0.1
2.5% 2.5%
Conversely, about 2.5% can be
0.0

  1.96

n
   196
.

n
x
expected to be above   1.96 n and
2.5% can be expected to be below
x 
  1.96 .
x n
2.5% fall below
the interval x
x
x
So 5% can be expected to fall outside
x 2.5% fall above the interval    1.96  ,   1.96  .
x
the interval  n n
x
x

95% fall within


the interval
The 95% Confidence Interval for 

A 95% confidence
fid interval
i t f  when
l for h  isi known
k andd sampling
li is
i
done from a normal population, or a large sample is used, is:

x  1.96
n

The quantity 1.9 6 is often called the margin of error or the
n
sampling error.
error
For example, if: n = 25 A 95% confidence interval:
 = 20  20
x  1.96  122  1.96
x = 122 n 25
 122  (1.96)( 4 )
 122  7 .84
 114 .16,129.84 
A ((1- ))100% Confidence Interval for 

W d fi z as the
We define th z value
l that
th t cuts
t off i ht t il area off  under
ff a right-tail d the
th standard
t d d
2
normal curve. (1-) is called the confidence coefficient.  is called the error
2

probability, and (1-)100% is called the confidence level.


S tand ard Norm al Distrib ution  
P z  z  
0.4  
(1   ) 2
  
P z   z  
 
0.3
2
 
f(z)

P  z z z   (1  )
  
0.2

0.1    2 2

2 2
0.0 (1- )100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 
 z Z z x  z
2 2
2 n
Critical Values of z and Levels of
Confidence

(1   )
 z
Stand ard N o rm al Distrib utio n

0.4
2 2 (1   )
03
0.3
0.99
0 99 00.005
00 22.5766

f(z)
0.2

0.98 0.010 2.326 0.1  


2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
 z z
0.90 0.050 1.645 2
Z
2

0.80 0.100 1.282


The Level of Confidence and the
Width of the Confidence Interval
When sampling from the same population,
population using a fixed sample size,
size the
higher the confidence level, the wider the confidence interval.
St an d ar d N o r m al Di s tri b uti o n St an d ar d N or m al Di s tri b uti o n

0.4 0.4

0.3 0.3
f(z)

f(z)
0.2 0.2

0.1 0.1

00
0.0 00
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:


 
x  1.28 x  196
.
n n
The Sample Size and the Width of the
Confidence Interval
When sampling from the same population,
population using a fixed confidence
level, the larger the sample size, n, the narrower the confidence
interval.
S a m p lin g D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an

0 .4 0 .9

0 .8

0 .3 0 .7

0 .6
0 .5

f(x)
f(x)

0 .2
0 .4

0 .3
0 .1
0 .2
2
0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Example
p

• Shrimmy,
Sh i th shrimp
the hi hatchery,
h t h is
i planning
l i to t invest
i t heavily
h il in i black
bl k tiger
ti
breed. As part of the decision, the company wants to estimate the average
amount of black tiger shrimp a family of four would need per month. A
random sample of n = 100 families is obtained, and in this sample the average
amountt off shrimp
hi in
i poundd per monthth is
i 6.5
6 5 andd the
th population
l ti standard
t d d
deviation is known to be 3.2. Construct a 95% confidence interval for the
average amount of shrimp consumed by the entire population of families of 4.
Confidence Interval or Interval Estimate for 
When  Is Unknown - The t Distribution

deviation ,
If the population standard deviation,  is not known,
known replace
with the sample standard deviation, s. If the population is
normal, the resulting statistic: t  X  
s
n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped
bell shaped and symmetric S d d normall
Standard
distributions, one for each number of degree of
freedom. t, df = 20
• The expected value of t is 0.
• For df > 2, the variance of t is df/(df
df/(df-2).
2). This is
t, df = 10
greater than 1, but approaches 1 as the number
of degrees of freedom increases. The t is flatter
and has fatter tails than does the standard
normal. 
• The t distribution approaches a standard normal 

as the number of degrees of freedom increases


Confidence Intervals for  when  is
Unknown The t Distribution
Unknown-

A ((1-))100% confidence interval for  when  is not known


(assuming a normally distributed population) is given by:
s
x t 
2n
where t is the value of the t distribution with n-1 degrees of
2 
f d
freedom that
h cuts offff a tail
il area off 2 to its
i right.
i h
The t Distribution
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f = 1 0
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355

}
f(tt)
9 1.383 1.833 2.262 2.821 3.250 0 .2
2

10 1.372 1.812 2.228 2.764 3.169


11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2 228
-2.228 0
16 1 337
1.337 1 746
1.746 2 120
2.120 2 583
2.583 2 921
2.921 2 228
2.228

}
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1 319
1.319
1.318
1 714
1.714
1.711
2 069
2.069
2.064
2 500
2.500
2.492
2 807
2.807
2.797
Whenever  is
Wh i nott known
k (and
( d the
th population
l ti is
i
25
26
1.316
1.315
1.708
1.706
2.060
2.056
2.485
2.479
2.787
2.779
assumed normal), the correct distribution to use is
27
28
1.314
1.313
1.703
1.701
2.052
2.048
2.473
2.467
2.771
2.763
the t distribution with n-1 degrees of freedom.
29
30
1.311
1 310
1.310
1.699
1 697
1.697
2.045
2 042
2.042
2.462
2 457
2.457
2.756
2 750
2.750
g degrees
Note, however, that for large g of freedom,
40
60
1.303
1.296
1.684
1.671
2.021
2.000
2.423
2.390
2.704
2.660
the t distribution is approximated well by the Z
120

1.289
1.282
1.658
1.645
1.980
1.960
2.358
2.326
2.617
2.576
distribution.
Example
p

A blood analyst wants to estimate the average AFP index of the Vietnamese
people. A random blood sample of size 15 yields an average of x  10.37ng / ml
and a standard deviation of s = 3.5 ng/ml. Assuming a normal population of
the AFP values,
al es give
gi e a 95% confidence inter
interval
al for the average
a erage AFP value
al e
of the Vietnamese population? (AFP=alpha-fetoprotein)
df
---
t0.100
-----
t0.050
-----
t0.025
------
t0.010
------
t0.005
------ The critical value of t for df = (n -1) = (15 -1)
1 3.078 6.314 12.706 31.821 63.657
. . . . . . =14 and a right-tail area of 0.025 is:
t 0.025  2.145
. . . . . .
. . . . . .
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 The corresponding
p g confidence interval or
15 1 341
1.341 1 753
1.753 2 131
2.131 2 602
2.602 2 947
2.947
s
.
.
.
.
.
.
.
.
.
.
.
. interval estimate is: x  t 0 . 025
. . . . . . n
35
.
 10.37  2.145
15
 10.37  1.94
 8.43,12.31
Large Sample Confidence Intervals for
the Population Mean

Whenever  is not known (and the population is


df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
. . . . . . assumed normal)
normal), the correct distribution to use is
. . . . . .
. . . . . . the t distribution with n-1 degrees of freedom.
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576 Note, however, that for large degrees of freedom,
the t distribution is approximated well by the Z
distribution.
Large Sample Confidence Intervals for
the Population Mean

A large - sample (1 -  )100% confidence interval for :


s
x  z
2 n

Example An environmental scientist wants to estimate the average amount of NOx in a given region. A random sample
of 100 data points gives x-bar = 357.60 ppm and s = 140.00 ppm. Give a 95% confidence interval for , the average
amount of NOx in any sample taken.

s 140.00
x  z 0 . 025  357.60  1.96  357.60  27.44   330.16,385.04 
n 100
Exercise 1
Exercise 2
Large-Sample Confidence Intervals
for the Population Proportion
Proportion, p

The estimator of the population proportion, p , is the sample proportion, p . If the


sample g , p has an approximately
p size is large, pp y normal distribution,, with E(( p ) = p and
pq
V( p ) = , where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .

For estimating p , a sample is considered large enough when both n  p an n  q are greater
than 5.
Large-Sample Confidence Intervals
for the Population Proportion
Proportion, p

A large - sample (1- )100% confidence interval for the population proportion, p :

pˆ  z pˆ qˆ
 /2 n
where the sample proportion, p sample x,
p̂, is equal to the number of successes in the sample,
divided by the number of trials (the sample size), n, and q̂ = 1- p̂.
Example

A marketing
k i researchh fi
firm wants to estimate
i the
h share
h thath foreign
f i companies i
have in the American market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign
foreign-made
made products; the rest are users of domestic products.
products Give a
95% confidence interval for the share of foreign products in this market.


pq ( 0.34 )( 0.66)
p  z  0.34  1.96
2
n 100
 0.34  (1.96)( 0.04737 )
 0.34  0.0928
  0.2472 ,0.4328

Thus, the
Th th firm
fi may be
b 95% confident
fid t th
thatt foreign
f i manufacturers
f t control
t l
anywhere from 24.72% to 43.28% of the market.
Exercise 3
Confidence Intervals for the Population Variance:
The Chi-Square (2) Distribution

• variance s2, is an unbiased estimator of the population


The sample variance,
variance, 2.
• Confidence intervals for the population variance are based on the chi-
square (2) distribution.
The chi-square distribution is the probability distribution of the sum of
several independent, squared standard normal random variables.
The mean of the chi-square distribution is equal to the degrees of
f d
freedom parameter, (E[ 2] = df).
t (E[ df) The
Th variance
i off a chi-square
hi i equall
is
to twice the number of degrees of freedom, (V[2] = 2df).
The Chi-Square
q (2) Distribution

C hi-S q uare D is trib utio n: d f=1 0 , d f=3 0 , d f =5 0


 The chi-square
chi square random variable cannot be
negative, so it is bound by zero on the left. 0 .1 0
0 .0 9 df = 10
 The chi-square distribution is skewed to the right. 0 .0 8
0 .0 7
 The chi-square distribution approaches a normal 0 .0 6

f( )
as the degrees of freedom increase. df = 30

2
0 .0 5
0 .0 4
0 .0 3 df = 50
0 .0 2
0 .0 1
0 .0 0
0 50 100

2

In sam pling from a norm al population, the random variable:

( n  1) s 2
 
2

 2

has a chi - square distribution w ith (n - 1) degrees of freedom .


Confidence Interval for the Population
Variance
A (1-)100%
(1 )100% confidence
fid iinterval
t l ffor th
the population
l ti variance
i * ((where
h th
the
population is assumed normal) is:
 2
 ( n  1) s , ( n  1) s 
2

  2 2  
 2
1
2 
 2
where  is the value of the chi-square distribution with n - 1 degrees of freedom
2   2
that cuts off an area to its right and  is the value of the distribution that
1
2 2 
cuts off an area of to its left (equivalently,
(equivalently an area of 1
1 to its right)
right).
2 2

* Note: Because the chi-square


chi square distribution is skewed
skewed, the confidence interval for the
population variance is not symmetric
Example
p

IIn an automated
t t d process, a machine
hi fills
fill cans off coffee.
ff If the
th average amountt
filled is different from what it should be, the machine may be adjusted to
correct the mean. If the variance of the filling process is too high, however, the
machine is out of control and needs to be repaired.
repaired Therefore,
Therefore from time to
time regular checks of the variance of the filling process are made. This is done
by randomly sampling filled cans, measuring their amounts, and computing the
sample variance. A random sample of 30 cans gives an estimate s2 = 18,540.
Give a 95% confidence interval for the population variance, 2.

 2
 ( n  1 ) s 2
( n  1 ) s   ( 30  1)18540 , ( 30  1)18540  11765,33604
  2
,
 2   

457
. 16.0   
 2
1
2 
Example
p ((continued))

A
Area iin Ri
Right
h TTail
il

df .995 .990 .975 .950 .900 .100 .050 .025 .010 .005
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
28 12 46
12.46 13 56
13.56 15 31
15.31 16 93
16.93 18 94
18.94 37 92
37.92 41 34
41.34 44 46
44.46 48 28
48.28 50 99
50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67

Chi-Square Distribution: df = 29

0.06

0.05
0.95
0.04
ff( )
2

0.03

0.02
0.025
0.01 0.025
0.00
0 10 20 30 40 50 60 70
2
 20.975  16.05  20.025  45.72
Sample-Size
p Determination

Before determining the necessary sample size


size, three questions must
be answered:
• How close do you want your sample estimate to be to the unknown parameter? (What is the
desired
de i ed bound,
b d B?)
• What do you want the desired confidence level (1-) to be so that the distance between your
estimate and the parameter is less than or equal to B?
• What is your estimate of the variance (or standard deviation) of the population in question?


For example: A (1-  ) Confidence Interval for : x  z 
2 n
Bound, B
Exercise 4
Sample
p Size and Standard Error

The sample size determines the bound of a statistic, since the standard
error of a statistic shrinks as the sample size increases:

Sample size = 2n
Standard error
of statistic

Sample size = n
Standard error
of statistic


Minimum Sample Size: Mean and
Proportion
Minimum required sample size in estimating the population
mean, :
z2 2
n 2 2
B
Bound of estimate:

B = z
2 n

Minimum required sample size in estimating the population


proportion, p
z2 pq
n 2 2
B
Example
p

A microbiologist
i bi l i wants to conductd an experiment
i to estimate
i the
h average amount
of micro-organisms in the water of a popular river. He plans to determine the
average amount of micro organism to within 120 µg/ml, with 95% confidence.
From past record,
record an estimate of the population standard deviation is
s = 400 µg/ml. What is the minimum required sample size?

z 
2 2

n 2
2
B

2 2
(1.96 ) ( 400 )
 2
120

 42 .684  43

You might also like