Introduction to Probability and Statistics - Normal Distribution

BC0406 – Introduction to
Probability and Statistics
Lecture 6
Normal Distribution
Dr. Richard H.A.H. Jacobs

Universidade Federal do ABC
Agenda
Program for today
• Announcements
• Normal distribution
• Definition
• Properties
• Standard normal distribution
• Z-scores
• Overview: What we have seen, where we

are, where we are going
• Review?
Announcements
• First list of exercises – Lista 1
• Template available for download on TIDIA-Ae
• Prova 1
• July 5th
• Balanced structure
• Multiple choice questions
• Short answer questions
• Open questions (e.g. draw graph)
• On the table there should be only
• ID, pencil, eraser, pen, and calculator
• Mobile or tablet use prohibited
• Language
• English....
• Chapters for today

• Larson & Farber: Chapter 5
Normal distribution
• Normal distribution
– Mathematical function that describes the
relation between Y (height of the
function) and any real number (X)
• Reasons for its widespread use Y

– Mathematics is (relatively) “simple”
– Good approximation of many empirical
distributions X
– Precisely describes the sample
distribution of our interest
• Random processes (ex., coin tosses)
• Means based on random samples
Dis t. of Means, sample s iz e= 5
Original
0 .5
population: N=5
0 .4
m = 3,5
D e n s i ty
0 .3
s = 1,71 sM = 0,76
0 .2
0.3
0 .1
0 .0
0.2 1 2 3 4 5 6
1 2 3 Mean
4 5 6
0.1
Dist. of Means, sample size= 10
0.6
1 2 3 4 5 6
N = 10
Density
0.4
sM = 0,54
0.2
Distribution of sample means
0.0
(scale preserved) 2
2
3
3 Mean
4
4 5
5
Dis t. of Means, sample size= 50
2
.5
1
1.8
S da distribuição
1.6
N = 50
1.4
.0
1
amostral
1.2
e
Dnsity
0.8
sM = 0,24
.5
0
0.6
0.4
0.2
.0
0
0
2.5 3.0 3.5 4.0 4.5
0 10 20 30 40 50 60
3 Mean
4
Sample size (N)
Normal distribution
4
3
Population
Frequency
2
0
1 2 3 4 5 6 7 8 9 10 11 12
Sample distributions… Value
Dist. of Means, sample size= 2 Dist. of Means, sample size= 3 Dist. of Means, sample size= 5
N=2 N=3 N=5

0.30
0.25
0.3
0.3
0.20
0.2
0.2
Density
Density
Density
0.15
0.10
0.1
0.1
0.05
0.0
0.0
0.0
-5 0 5 10 15 0 5 10 -2 0 2 4 6 8 10 12
Mean Mean Mean
Dist. of Means, sample size= 10 Dist. of Means, sample size= 50 4

N = 10 N = 50
0.8
3.5
0.3
3
0.6
S das médias
2.5
amostrais
0.2
2
Density
Density
0.4
1.5
0.1
1
0.2
0.5
0.0
0.0
0
2 4 6 8 3 4 5 6 7
0 10 20 30 40 50 60
Mean Mean
N
Normal distribution
1
-[(x - m)2 / 2s2]
Y= e
2s

2
 = 3,14 s2 = positive real (dispersion)

e = 2,7183 m = any real (center)
X = any real
Normal distribution
Symmetric
Unimodal
Bell-shaped Peak in the center
Inflection point Inflection point
Y
Approaches, but
never reaches 0
-1s m +1s
X
Normal distribution
Normal distribution
100 Coin Flips

0.08
0.06
Relative Frequency
0.04
0.02
IQ Scores n = 20000 M = 100 s = 15

0.025
0.0
30 40 50 60 70
0.020
# Heads
Probability
0.015
0.010
0.005
0.0
40 60 80 100 120 140 160

Interval
Normal distribution
Multiple Normal Distributions
m= 50 s= 15 m= 100 s= 15 m= 200 s= 15
0.025
0.020
0.015
Probability
Y
0.010
0.005
0.0
0 50 100 150 200 250
X
Interval
Normal distribution
m= 100 s= 15
0.025
0.020
m= 100 s= 20
m= 100 s= 25
0.015
Probability
Y
0.010
0.005
0.0
50 100 150
X
Interval
Normal distribution
0.04 m= 50 s= 10
“T” m=50, s=10

0.03
m= 100 s= 15 “IQ” m=100, s=15

Probability
0.02
Y
“Achievement” test score

m=500, s=50
0.01
m= 500 s= 50
0.0
0 200 400 600

Interval
X
Standardized normal distribution
IQ Scores n = 20000 M = 100 s = 15

0.025
0.020
Probability
0.015
100 Coin Flips

0.010
0.08
0.005
0.06
0.0
40 60 80 100 120 140 160

Relative Frequency
Interval
0.04
0.02
0.0
30 40 50 60 70
# Heads
Assume:
1. m = 0
2
2. s = 1
• We need linear transformations to use the
standard normal distribution
– We can transform any distribution to:

• Mean = 0
• Standard deviation = 1
– We can transform the standard

normal distribution back to any mean
and standard deviation, e.g.:
• Mean = 100
• Standard deviation = 15
• Two transformations suffice

– Addition
– Multiplication
• For a variable X and constant c, Mx = 3; sx2 = 1,33; sx = 1,15
consider the transformation:
– Y=X+c X
– See example
1 2 3 4 5
• In general, for additive
transformations, Y = X + c Y=X+5
– My = M x + c
– s 2y = s 2 x 1 2 3 4 5 6 7 8 9 10
– sy = s x My = 8; sy2 = 1,33; sy= 1,15
• Note that the additive

transformation is a linear
10
transformation
– Y = X + c increases linearly
9
with X Y
– Therefore, it does not alter the 8
7
shape
6
1 2 3 4 5
x
X
• For a variable X and constant c, Mx = 3; sx2 = 1,33; sx = 1,15
consider the transformation:
– Y=X*c X
– See example
1 2 3 4 5
• In general, for Y = X * c,
– My = Mx * c Y=X*2
– s2y = s2x * c2 (NB)
– sy = sx * c 1 2 3 4 5 6 7 8 9 10
My = 6; sy2 = 5,33; sy = 2,30

• Again, we have a linear
transformation
– Y = X * c increases linearly with
10
X
8
– Therefore, the shape does not
6
change Y
4
2
1 2 3 4 5
x
X
Z-transformation
• For any variable X, with mean X and

standard deviation s,
• Z = (X - Xx)/sx
Sets sz = 1
Leaves X = 0, as it is.
Sets mean = 0
Leaves s as it is.
Thus: Xz = 0, sz = 1, shape does not change.
Transformation back
• For the variable Z, with X = 0, s = 1,

• T = (Z * sdesired) + Xdesired
Set XT = 0 + Xdesired
Leaves sT as it is.
Set sT = 1 * sdesired = sdesired
Leaves X and shape as they are.
Z-transformation
• Summary:
• Applying additive and multiplicative transformations

• We change the distribution average to any value we want
• We change the standard deviation to any value we want
• We do not change the shape of the distribution (linear transf.)
• z-Transformations can be used for:

• Individual points (observations) of interest
• It becomes a z-score
• Expresses the position of the point with respect to the mean
in units of standard deviation
• Whole distributions
• Each observation becomes a z-score
• M = 0, s = 1
Z-Transformation
• Example 1 – discovering z values for raw data
You participate in an “anxiety test” that uses a questionnaire. The test has
m = 50 and s = 10.
• Your score was 65. What is your z-score?
• z = (65 – 50)/10 = 15/10 = 1.50
• Note: z = 1.50 means that your score was 1.50 units above the
mean
Z-Transformation
• Example 2 – returning to raw data from z-scores
You participate in a “patience test” that uses a questionnaire. The test has
m = 25 and s = 5.
• The experimenter reports that your z-score was 2.0. What was
your score?
• P = (z * s) + Mdesired
= (2 * 5) + 25 = 35
Z-Transformation
• Example 3 – calculating percentiles from z-scores
You participate in an “anxiety test” that uses a questionnaire. The test has
m = 50 and s = 10.
• Your score was 65. What percentage of the population has a
lower score than yours?
• z = (65 – 50)/10 = 15/10 = 1.50
• Let’s look at the problem graphically…

Z-Transformation
Z-Transformation
Standard Normal n = 1000 M = 0 s = 1
0.4
0.3
Area = 0,93
(or 93%)
0.2
Y
Area = 0,07
(or 7%)
0.1
0.0
-4 -2 0 2 4
Z=1,5
ZX
Normal distribution
• Example 4 – reaction time
An investigator created a test based on reaction time of participants. Despite

being a widely used measure in many surveys, there are no standards yet
for the use of this new test. The investigator converts the original reaction
times to a distribution T, with m = 50 and s = 10, and publishes a table with
the percentile for each score. He says that the percentile of values below T =
35 was 7%. Evaluate the analysis.
z = (35 – 50)/10 = -15/10 = -1,5
(see below…)
0.4
Standard Normal n = 1000 M = 0 s = 1
Is this correct?
Probably not…
0.3
0.2
Y
Area = 0,0668 Area = 0,0668

0.1
0.0
-1,5 1,5
Z
-4 -2 0 2 4
X
Normal distribution
• No justification was given for Dados de TR originais
using the approximation of the
normal distribution
Histogram of Raw Scores
– RT data are not symmetric Mínimo =~ 5msec
600
(normally distributed)
400
Frequency
– (see example)
200
0
50 100 150
• Z and T are linear

Milliseconds
transformations Milisegundos
– They do not create a normal Histogram of Z Scores (m=0,s=1) Histogram of T Scores (m=50,s=10)
distribution from a Mínimo =~ -1 Mínimo =~ 38

distribution that is not normal
300
300
Frequency
Frequency
– Solutions based on the
200
200
100
standard normal distribution
100
properties are wrong in this
0
0
-1 0 1 2 3 4 5
40 50 60 70 80 90 100
Z Score
T Score
case Escore Z Escore T

– In fact, there are no scores
T < = 35 (!)
Normal distribution
When the data distribution is not
Dados de TR originais
normal
• Use of the normal approximation Histogram of Raw Scores
for the calculation of areas Mínimo =~ 5msec
600
(percentiles) is inappropriate
400
Frequency
• What to do?
200
– Use empirical distributions
0
(which have actually been 50 100
Milliseconds
150
observed) Milisegundos
• Frequency histogram
• Stem-and-leaf plot
Histogram of Z Scores (m=0,s=1) Histogram of T Scores (m=50,s=10)
Mínimo =~ -1 Mínimo =~ 38
• Ordered data
300
300
– Identify the value of interest
Frequency
Frequency
200
200
and calculate the fraction of
100
100
observations below (or
0
0
-1 0 1 2 3 4 5
40 50 60 70 80 90 100
Z Score
above) this value

T Score
Escore Z Escore T
Back to the overview
Inferences about the
population & treatment effect.
Population
Sampling
“Inferential statistics”
method
(pref. random) “Reverse reasoning”
“Hypothesis testing”
“Model comparison”
Sample
“Descriptive statistics”
Exploratory data analysis.
Treatment A Graphic & numeric summaries.
“Detective work”
Allocation to “Systematic accumulation and
groups contr/exp exploration of evidence”
(pref. random) Treatment B
Data
Experimental and Control Procedures
Review
Questions?
• Lectures 1 and 2: Introduction, what is statistics, planning
experiments
• Lectures 3 and 4: Types of variables, data classification, graphic

representation, measures of central tendency
• Lectures 5 and 6: dispersion measures, normal distribution,

transformation to z-scores

Introduction to Probability and Statistics - Normal Distribution

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction to Probability and Statistics - Normal Distribution

Uploaded by

Copyright:

Available Formats

BC0406 – Introduction to

Probability and Statistics

Dr. Richard H.A.H. Jacobs

• Overview: What we have seen, where we

• Chapters for today

• Reasons for its widespread use Y

Sample distributions… Value

N=2 N=3 N=5

Dist. of Means, sample size= 10 Dist. of Means, sample size= 50 4

 = 3,14 s2 = positive real (dispersion)

Inflection point Inflection point

100 Coin Flips

IQ Scores n = 20000 M = 100 s = 15

40 60 80 100 120 140 160

Multiple Normal Distributions

0 50 100 150 200 250

Multiple Normal Distributions

Multiple Normal Distributions

“T” m=50, s=10

m= 100 s= 15 “IQ” m=100, s=15

“Achievement” test score

0 200 400 600

IQ Scores n = 20000 M = 100 s = 15

100 Coin Flips

40 60 80 100 120 140 160

– We can transform any distribution to:

– We can transform the standard

• Two transformations suffice

• Note that the additive

My = 6; sy2 = 5,33; sy = 2,30

• For any variable X, with mean X and

• For the variable Z, with X = 0, s = 1,

• Applying additive and multiplicative transformations

• z-Transformations can be used for:

• z = (65 – 50)/10 = 15/10 = 1.50

• z = (65 – 50)/10 = 15/10 = 1.50

• Let’s look at the problem graphically…

An investigator created a test based on reaction time of participants. Despite

z = (35 – 50)/10 = -15/10 = -1,5

Area = 0,0668 Area = 0,0668

– RT data are not symmetric Mínimo =~ 5msec

• Z and T are linear

distribution from a Mínimo =~ -1 Mínimo =~ 38

standard normal distribution

case Escore Z Escore T

for the calculation of areas Mínimo =~ 5msec

above) this value

• Lectures 3 and 4: Types of variables, data classification, graphic

• Lectures 5 and 6: dispersion measures, normal distribution,

You might also like