You are on page 1of 34

BC0406 – Introduction to

Probability and Statistics

Lecture 6
Normal Distribution

Dr. Richard H.A.H. Jacobs


Universidade Federal do ABC
Agenda
Program for today

• Announcements

• Normal distribution
• Definition
• Properties
• Standard normal distribution
• Z-scores

• Overview: What we have seen, where we


are, where we are going

• Review?
Announcements
• First list of exercises – Lista 1
• Template available for download on TIDIA-Ae

• Prova 1
• July 5th
• Balanced structure
• Multiple choice questions
• Short answer questions
• Open questions (e.g. draw graph)
• On the table there should be only
• ID, pencil, eraser, pen, and calculator
• Mobile or tablet use prohibited
• Language
• English....

• Chapters for today


• Larson & Farber: Chapter 5
Normal distribution
• Normal distribution
– Mathematical function that describes the
relation between Y (height of the
function) and any real number (X)

• Reasons for its widespread use Y


– Mathematics is (relatively) “simple”
– Good approximation of many empirical
distributions X
– Precisely describes the sample
distribution of our interest
• Random processes (ex., coin tosses)
• Means based on random samples
Dis t. of Means, sample s iz e= 5

Original

0 .5
population: N=5

0 .4
m = 3,5

D e n s i ty
0 .3
s = 1,71 sM = 0,76

0 .2
0.3

0 .1
0 .0
0.2 1 2 3 4 5 6

1 2 3 Mean
4 5 6
0.1
Dist. of Means, sample size= 10

0.6
1 2 3 4 5 6
N = 10

Density
0.4
sM = 0,54

0.2
Distribution of sample means
0.0
(scale preserved) 2

2
3

3 Mean
4
4 5

5
Dis t. of Means, sample size= 50

2
.5
1

1.8
S da distribuição

1.6

N = 50
1.4
.0
1
amostral

1.2
e
Dnsity

0.8
sM = 0,24
.5
0

0.6

0.4

0.2
.0
0

0
2.5 3.0 3.5 4.0 4.5
0 10 20 30 40 50 60
3 Mean
4
Sample size (N)
Normal distribution
4

3
Population

Frequency
2

0
1 2 3 4 5 6 7 8 9 10 11 12

Sample distributions… Value

Dist. of Means, sample size= 2 Dist. of Means, sample size= 3 Dist. of Means, sample size= 5

N=2 N=3 N=5


0.30
0.25

0.3
0.3
0.20

0.2
0.2
Density

Density

Density
0.15
0.10

0.1
0.1
0.05
0.0

0.0

0.0
-5 0 5 10 15 0 5 10 -2 0 2 4 6 8 10 12
Mean Mean Mean

Dist. of Means, sample size= 10 Dist. of Means, sample size= 50 4


N = 10 N = 50
0.8

3.5
0.3

3
0.6

S das médias
2.5

amostrais
0.2

2
Density

Density

0.4

1.5
0.1

1
0.2

0.5
0.0

0.0

0
2 4 6 8 3 4 5 6 7
0 10 20 30 40 50 60
Mean Mean

N
Normal distribution

1
-[(x - m)2 / 2s2]
Y= e
2s

2

 = 3,14 s2 = positive real (dispersion)


e = 2,7183 m = any real (center)
X = any real
Normal distribution
Symmetric
Unimodal
Bell-shaped Peak in the center

Inflection point Inflection point

Y
Approaches, but
never reaches 0

-1s m +1s

X
Normal distribution
Normal distribution

100 Coin Flips


0.08
0.06
Relative Frequency

0.04
0.02

IQ Scores n = 20000 M = 100 s = 15


0.025
0.0

30 40 50 60 70
0.020

# Heads
Probability

0.015
0.010
0.005
0.0

40 60 80 100 120 140 160


Interval
Normal distribution

Multiple Normal Distributions

m= 50 s= 15 m= 100 s= 15 m= 200 s= 15
0.025
0.020
0.015
Probability

Y
0.010
0.005
0.0

0 50 100 150 200 250

X
Interval
Normal distribution

Multiple Normal Distributions

m= 100 s= 15
0.025
0.020

m= 100 s= 20

m= 100 s= 25
0.015
Probability
Y
0.010
0.005
0.0

50 100 150

X
Interval
Normal distribution

Multiple Normal Distributions

0.04 m= 50 s= 10

“T” m=50, s=10


0.03

m= 100 s= 15 “IQ” m=100, s=15


Probability

0.02
Y

“Achievement” test score


m=500, s=50
0.01

m= 500 s= 50
0.0

0 200 400 600


Interval
X
Standardized normal distribution

IQ Scores n = 20000 M = 100 s = 15


0.025
0.020
Probability

0.015

100 Coin Flips


0.010
0.08

0.005
0.06

0.0

40 60 80 100 120 140 160


Relative Frequency

Interval
0.04
0.02
0.0

30 40 50 60 70
# Heads
Standardized normal distribution

Assume:

1. m = 0

2
2. s = 1
Standardized normal distribution
Standardized normal distribution
• We need linear transformations to use the
standard normal distribution

– We can transform any distribution to:


• Mean = 0
• Standard deviation = 1

– We can transform the standard


normal distribution back to any mean
and standard deviation, e.g.:
• Mean = 100
• Standard deviation = 15

• Two transformations suffice


– Addition
– Multiplication
Standardized normal distribution
• For a variable X and constant c, Mx = 3; sx2 = 1,33; sx = 1,15
consider the transformation:
– Y=X+c X
– See example
1 2 3 4 5
• In general, for additive
transformations, Y = X + c Y=X+5
– My = M x + c
– s 2y = s 2 x 1 2 3 4 5 6 7 8 9 10
– sy = s x My = 8; sy2 = 1,33; sy= 1,15

• Note that the additive


transformation is a linear

10
transformation
– Y = X + c increases linearly

9
with X Y
– Therefore, it does not alter the 8
7

shape
6

1 2 3 4 5
x

X
Standardized normal distribution
• For a variable X and constant c, Mx = 3; sx2 = 1,33; sx = 1,15
consider the transformation:
– Y=X*c X
– See example
1 2 3 4 5
• In general, for Y = X * c,
– My = Mx * c Y=X*2
– s2y = s2x * c2 (NB)
– sy = sx * c 1 2 3 4 5 6 7 8 9 10

My = 6; sy2 = 5,33; sy = 2,30


• Again, we have a linear
transformation
– Y = X * c increases linearly with

10
X

8
– Therefore, the shape does not
6
change Y
4
2

1 2 3 4 5
x

X
Z-transformation

• For any variable X, with mean X and


standard deviation s,
• Z = (X - Xx)/sx
Sets sz = 1
Leaves X = 0, as it is.
Sets mean = 0
Leaves s as it is.
Thus: Xz = 0, sz = 1, shape does not change.
Transformation back

• For the variable Z, with X = 0, s = 1,


• T = (Z * sdesired) + Xdesired

Set XT = 0 + Xdesired
Leaves sT as it is.
Set sT = 1 * sdesired = sdesired
Leaves X and shape as they are.
Z-transformation
• Summary:

• Applying additive and multiplicative transformations


• We change the distribution average to any value we want
• We change the standard deviation to any value we want
• We do not change the shape of the distribution (linear transf.)

• z-Transformations can be used for:


• Individual points (observations) of interest
• It becomes a z-score
• Expresses the position of the point with respect to the mean
in units of standard deviation
• Whole distributions
• Each observation becomes a z-score
• M = 0, s = 1
Z-Transformation
• Example 1 – discovering z values for raw data

You participate in an “anxiety test” that uses a questionnaire. The test has
m = 50 and s = 10.
• Your score was 65. What is your z-score?

• z = (65 – 50)/10 = 15/10 = 1.50

• Note: z = 1.50 means that your score was 1.50 units above the
mean
Z-Transformation
• Example 2 – returning to raw data from z-scores

You participate in a “patience test” that uses a questionnaire. The test has
m = 25 and s = 5.
• The experimenter reports that your z-score was 2.0. What was
your score?

• P = (z * s) + Mdesired

= (2 * 5) + 25 = 35
Z-Transformation
• Example 3 – calculating percentiles from z-scores

You participate in an “anxiety test” that uses a questionnaire. The test has
m = 50 and s = 10.
• Your score was 65. What percentage of the population has a
lower score than yours?

• z = (65 – 50)/10 = 15/10 = 1.50

• Let’s look at the problem graphically…


Z-Transformation
Z-Transformation
Standard Normal n = 1000 M = 0 s = 1
0.4
0.3

Area = 0,93
(or 93%)
0.2
Y

Area = 0,07
(or 7%)
0.1
0.0

-4 -2 0 2 4
Z=1,5
ZX
Normal distribution
• Example 4 – reaction time

An investigator created a test based on reaction time of participants. Despite


being a widely used measure in many surveys, there are no standards yet
for the use of this new test. The investigator converts the original reaction
times to a distribution T, with m = 50 and s = 10, and publishes a table with
the percentile for each score. He says that the percentile of values below T =
35 was 7%. Evaluate the analysis.

z = (35 – 50)/10 = -15/10 = -1,5

(see below…)
0.4
Standard Normal n = 1000 M = 0 s = 1

Is this correct?
Probably not…
0.3
0.2
Y

Area = 0,0668 Area = 0,0668


0.1
0.0

-1,5 1,5
Z
-4 -2 0 2 4
X
Normal distribution
• No justification was given for Dados de TR originais
using the approximation of the
normal distribution
Histogram of Raw Scores

– RT data are not symmetric Mínimo =~ 5msec

600
(normally distributed)

400
Frequency
– (see example)

200
0
50 100 150

• Z and T are linear


Milliseconds

transformations Milisegundos
– They do not create a normal Histogram of Z Scores (m=0,s=1) Histogram of T Scores (m=50,s=10)

distribution from a Mínimo =~ -1 Mínimo =~ 38


distribution that is not normal
300

300
Frequency

Frequency
– Solutions based on the
200

200
100

standard normal distribution

100
properties are wrong in this
0

0
-1 0 1 2 3 4 5
40 50 60 70 80 90 100
Z Score
T Score

case Escore Z Escore T


– In fact, there are no scores
T < = 35 (!)
Normal distribution
When the data distribution is not
Dados de TR originais
normal
• Use of the normal approximation Histogram of Raw Scores

for the calculation of areas Mínimo =~ 5msec

600
(percentiles) is inappropriate

400
Frequency
• What to do?

200
– Use empirical distributions

0
(which have actually been 50 100
Milliseconds
150

observed) Milisegundos
• Frequency histogram
• Stem-and-leaf plot
Histogram of Z Scores (m=0,s=1) Histogram of T Scores (m=50,s=10)

Mínimo =~ -1 Mínimo =~ 38
• Ordered data
300

300
– Identify the value of interest
Frequency

Frequency
200

200
and calculate the fraction of
100

100
observations below (or
0

0
-1 0 1 2 3 4 5
40 50 60 70 80 90 100
Z Score

above) this value


T Score

Escore Z Escore T
Back to the overview
Inferences about the
population & treatment effect.
Population
Sampling
“Inferential statistics”
method
(pref. random) “Reverse reasoning”
“Hypothesis testing”
“Model comparison”

Sample
“Descriptive statistics”
Exploratory data analysis.
Treatment A Graphic & numeric summaries.
“Detective work”
Allocation to “Systematic accumulation and
groups contr/exp exploration of evidence”
(pref. random) Treatment B

Data
Experimental and Control Procedures
Review
Questions?
• Lectures 1 and 2: Introduction, what is statistics, planning
experiments

• Lectures 3 and 4: Types of variables, data classification, graphic


representation, measures of central tendency

• Lectures 5 and 6: dispersion measures, normal distribution,


transformation to z-scores

You might also like