Professional Documents
Culture Documents
Histograms
For large datasets and/or quantitative variables that take many values:
Divide the possible values into classes or intervals of equal widths.
Count how many observations fall into each interval. Instead of
counts, one may also use percents.
Draw a picture representing the distribution―each bar height is
equal to the number (or percent) of observations in its interval.
Distribution of IQ Scores
18
16
14 Class Count
75 ≤ IQ Score < 85 2
Number of Students
12
85 ≤ IQ Score < 95 3
10
95 ≤ IQ Score < 105 10
8
105 ≤ IQ Score < 115 16
6 115 ≤ IQ Score < 125 13
4 125 ≤ IQ Score < 135 10
2 135 ≤ IQ Score < 145 5
0 145 ≤ IQ Score < 155 1
75 85 95 10
5
11
5
12
5
13
5
14
5
IQ Score
Outliers
An important kind of deviation is an outlier. Outliers are observations that lie outside
the overall pattern of a distribution. Always look for outliers and try to explain them.
The overall pattern here is
fairly symmetrical except
for two states that clearly
do not belong to the main
pattern. Alaska and Florida
have unusually small and
large percents,
respectively, of elderly
Alaska Florida
residents in their
populations.
Density Curves
Here is a histogram of vocabulary scores of 947 seventh
graders.
Density Curves
The mean and standard deviation computed from
actual observations (data) are denoted by and s,
respectively.
The mean and standard deviation of the actual
distribution represented by the density curve are
denoted by µ (“mu”) and (“sigma”), respectively.
7
Normal Distributions
One particularly important class of density curves is the class of Normal
curves, which describe Normal distributions.
All Normal curves are symmetric, single-peaked, and bell-shaped.
A specific Normal curve is described by giving its mean µ and
standard deviation σ.
The Normal Distribution
• The Normal distribution is found to be a suitable model for
many naturally occurring variables which tend to be
symmetrically distributed about a central modal value - the
mean.
• e.g. human heights, weights, IQs etc. and also the output
from many production processes,
• Normal distribution is characterized by the mean and the
Standard deviation
• Ability to put a particular score into perspective
• How many standard deviations above/below the mean?
There is no single normal curve, but a family of curves, each
one defined by its mean, µ, and standard deviation, ; µ and
are called the parameters of the distribution.
As we can see the curves may have different centres and/or different
spreads but they all certain characteristics in common:
The curve is bell-shaped,
it is symmetrical about the mean ( µ ),
the mean, mode and median coincide.
• Area beneath the Normal Distribution Curve
No matter what the values of µ and are for a normal probability
distribution, the total area under the curve is equal to one.
• We can therefore consider partial areas under the curve as
representing probabilities. The partial area between a stated number
of standard deviation below and above the mean is always the same,
as illustrated below
• N.B. The curve neither finishes nor meets the horizontal axis at
3, it only approaches it and actually goes on indefinitely.
11
Leptokurtic
Mesokurtic
Platykurtic
Testing for Normality
(or approximate Normality)
• Visual inspection of Histograms, Box Plots, Q-Q Plots etc
• Values for Skewness & Kurtosis
• Sometimes transformations can correct for skew and kurtosis
• Square root of raw scores
• Inverse (1/x) of raw scores
• Base 10 Log of raw scores
The data points are ranked and the percentile ranks are
converted to z-scores. The z-scores are then used for
the x axis against which the data are plotted on the y
axis of the Normal quantile plot.
Standardizing Observations
If a variable x has a distribution with mean µ and standard
deviation σ, then the standardized value of x, or its z-score, is
All Normal distributions are the same if we measure in units of size σ from
the mean µ as center.
The
The standard
standard Normal
Normal distribution
distribution
is
is the
the Normal
Normal distribution
distribution with
with mean
mean
00 and
and standard
standard deviation
deviation 1.
1. That
That is,
is,
the
the standard
standard Normal
Normal distribution
distribution is
is
N(0,1).
N(0,1).
Z Scores: An example
• A dataset is normally
distributed with a mean 70- 50
z= = 2.0
of 50 and standard 10
deviation of 10.
Determine the z score
for a value of 70
25
Normal Calculations
Find the proportion of observations from the standard Normal distribution
that are between –1.25 and 0.81.
28
Normal Calculations
How to Solve Problems Involving Normal Distributions
Perform calculations.
Standardize x to restate the problem in terms of a standard
Normal variable z.
Use the table and the fact that the total area under the
curve is 1 to find the required area under the standard
Normal curve.
N(70, 2.8)
.10
? 70
30
Normal Calculations
How tall is a man who is taller N(70, 2.8)
than exactly 10% of men aged
18–24?
.10
? 70
Look up the probability
closest to 0.10 in the table.
z .07 .08 .09
Find the corresponding
standardized score. –1.3 .0853 .0838 .0823
The value you seek is that
–1.2 .1020 .1003 .0985
many standard deviations
from the mean. –1.1 .1210 .1190 .1170
Z = –1.28
31
Normal Calculations
How tall is a man who is taller than
exactly 10% of men aged 18–24? N(70, 2.8)
Z = –1.28 .10
? 70
We need to “unstandardize” the z-score to find the observed value (x):
𝑥−𝜇
𝑧= ⟹ 𝑥 = 𝜇+ 𝑧 𝜎
𝜎
x = 70 + z(2.8)
= 70 + [(–1.28 ) (2.8)]
= 70 + (–3.58) = 66.42
A man would have to be approximately 66.42 inches tall or less to be in
the lower 10% of all men in the population.
Example:
A variety of questions about the spending habits of Supermarket shoppers can
be answered given the information that: μ = £50.00 = £15.00 n = 500
x μ 80 50 30
z 2.00
σ 15 15
• From tables:
P(x > £80) = 1.0 - 0.9772 = 0.0228
Probability that a shopper spends between £30 and £80,
i.e. P(£30 < x < £80)
= £50.00 = £15.00
P(x < 80) Reading from tables is 0.9772 (see previous slide)
The table values are all positive, so when z is negative we invoke the symmetry of the
situation and use its absolute value.