Professional Documents
Culture Documents
probability distributions:
Changing σ increases or
decreases the spread.
σ
μ X
The Normal Distribution:
as mathematical function (pdf)
1 x 2
1 ( )
f ( x) e 2
2
This is a bell shaped
Note constants: curve with different
=3.14159 centers and spreads
e=2.71828 depending on and
The Normal PDF
It’s a probability function, so no matter what the values
of and , must integrate to 1!
1 x 2
1 ( )
2
e 2 dx 1
Normal distribution is defined
by its mean and standard dev.
1 x 2
E(X)= = 1 ( )
x
2
e 2 dx
1 x 2
1 ( )
Var(X)=2 = (
x2
2
e 2 dx) 2
Standard Deviation(X)=
**The beauty of the normal curve:
68% of
the data
2 1 x 2
1 ( )
2 2
e 2 dx .95
3 1 x 2
1 ( )
3 2
e 2 dx .997
How good is rule for real data?
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the
mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
Example
Suppose SAT scores roughly follows a normal
distribution in the U.S. population of college-
bound students (with range restricted to 200-
800), and the average math SAT is 500 with a
standard deviation of 50, then:
68% of students will have scores between 450 and
550
95% will be between 400 and 600
99.7% will be between 350 and 650
Example
BUT…
What if you wanted to know the math SAT
1 Z 0 2 1
1 ( ) 1 ( Z )2
p( Z ) e 2 1
e 2
(1) 2 2
The Standard Normal Distribution (Z)
All normal distributions can be converted into
the standard normal curve by subtracting the
mean and dividing by the standard deviation:
X
Z
0 2.0 Z ( = 0, = 1)
Example
For example: What’s the probability of getting a math SAT score of 575 or less, =500 and =50?
575 500
Z 1.5
50
i.e., A score of 575 is 1.5 standard deviations above the mean
Yikes!
But to look up Z= 1.5 in standard normal chart (or enter
into SAS) no problem! = .9332
Looking up probabilities in the
standard normal table
What is the area to the
left of Z=1.50 in a
standard normal curve?
Area is 93.32%
Z=1.50
Z=1.50
Looking up probabilities in the
standard normal table
What is the area to the
left of Z=1.51 in a
standard normal curve?
Area is 93.45%
Z=1.51
Z=1.51
Probit function: the inverse
(area)= Z: gives the Z-value that goes with
the probability you want
For example, recall SAT math scores
example. What’s the score that corresponds
to the 90th percentile?
In the Table, find the Z-value that corresponds
to an area of 90%...
90% area corresponds
to a Z score of about
1.28.
Probit function: the inverse
38 28
Z 1 .0
10
8 28
Z 2.0
10
a. 100%
b. 50%
c. 10%
d. 0%
Review question 2
The probability that Z is between -2
and -1 is _____.
a. 50%
b. 34%
c. 25.5%
d. 13.5%
Review question 3
The probability that Z values are
larger than _____ is 0.6985.
a. Z=1
b. Z=0
c. Z=-.5
d. Z=+.5
Review question 4
27% of Z values are smaller than
____.
a. Z=0
b. Z=1
c. Z=-.6
d. Z=+.6
Are my data “normal”?
Not all continuous random variables are
normally distributed!!
It is important to evaluate how well the
data are approximated by a normal
distribution
Are my data normally
distributed?
1. Look at the histogram! Does it appear bell
shaped?
2. Compute descriptive summary measures—are
mean, median, and mode similar?
3. Do 2/3 of observations lie within 1 std dev of the
mean? Do 95% of observations lie within 2 std
dev of the mean?
4. Look at a normal probability plot—is it
approximately linear?
5. Run tests of normality (such as Kolmogorov-
Smirnov). But, be cautious, highly influenced by
sample size!
Data from our class…
Median = 8
Mean = 8.8
Mode = 0
SD = 8.3
Range = 0 to 32
(= 4 σ)
Data from our class…
Median = 45
Mean = 41
Mode = 6
SD = 23
Range = 0 to 83
(~ 3.5 σ
Data from our class…
Median = 4
Mean = 3.7
Mode = 4
SD = 1.8
Range = 0.5 to 7
(~ 3.5 σ
Data from our class…
Median = 18
Mean = 20
Mode = 20
SD = 16
Range = 2 to 70
(~4 σ
Data from our class…
41 +/- 23 =
18 – 64
18 64
Data from our class…
41 +/- 2*23 =
0 – 87
0 87
Data from our class…
41 +/- 3*23 =
0– 100
0 100
Data from our class…
0.1 – 7.3
Data from our class…
0 – 9.1
Data from our class…
4 36 20 +/- 16 =
4– 36
Data from our class…
0
52 20 +/- 2*16 =
0– 52
Data from our class…
0
68 20 +/- 3*16 =
0– 68
Outlier!
The Normal Probability Plot
Normal probability plot
Order the data.
Find corresponding standardized normal quantile
values: th i
i quantile ( )
n 1
where is the probit function, which gives the Z value
that corresponds to a particular left - tail area
Right-Skewed!
(concave up)
Normal probability plot love of
writing…
A wiggly line!
Norm prob. plot Exercise…
Right-Skewed!
(concave up)
Formal tests for normality
Results:
Coffee: Moderate evidence of non-normality
(p=.008 to p=.11)
Writing love: No evidence of non-normality
(all p>.15)
Exercise: No evidence of non-normality (all
p>.15)
Homework: Strong evidence of non-normality
(all p<.01)
Review question 5
Which of the following does NOT support
the conclusion that your data are normally
distributed:
0 1 2 3 4 5 6 7 8
120 125
Z .52 P(Z<-.52)= .3015
9.68
Review question 6
If you flip a coin 1600 times, what is
the approximate probability that you
will get fewer than 860 heads?
a. 25%
b. 2.5%
c. 0.5%
d. 0.005%
Review Problem 7
Which of the following about the normal
distribution is NOT true?
x np(1 p)
Differs
by a
factor
pˆ p of n.
For proportion:
np(1 p ) p (1 p )
pˆ 2 2
n n
P-hat stands for “sample p (1 p )
proportion.” pˆ
n
It all comes back to Z…
Statistics for proportions are based on a
normal distribution, because the
binomial can be approximated as
normal if np>5
Homework
Problem Set 3
Reading: Vickers 10-15
Journal article/article review sheet