0% found this document useful (0 votes)
50 views28 pages

Normal Approximation for Data Analysis

The average and standard deviation scale linearly with the transformation. Doubling all values doubles the average and SD. Adding a constant shifts the average but does not change the SD.

Uploaded by

Alper Mert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views28 pages

Normal Approximation for Data Analysis

The average and standard deviation scale linearly with the transformation. Doubling all values doubles the average and SD. Adding a constant shifts the average but does not change the SD.

Uploaded by

Alper Mert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

DESCRIPTIVE

STATISTICS
Chapter 5
The Normal Approximation for
Data
The Normal Curve
• Discovered in 1720
by A. de Moivre y
• Around 1870, A.
Quetelet had the
idea of using the
curve as an “ideal”
histogram to which
x
histograms of data
could be compared!
1 x2 / 2
y e
2
Use a table, not the formula
• The area under the normal curve
– between -1 and +1 is about 68%
– between -2 and +2 is about 95%
– between -3 and +3 is about 99.7%
– outside -3 and +3 is about 0.3 %
– outside -4 and +4 is about 0.0003 %
• Many histograms for data are similar
in shape to normal curve, but not all!
How to compare?
• The histogram must be drawn to the
same scale as the normal curve:
– Make the horizontal scale the same,
that is, convert the data units to
standard units.
– A value is converted to standard units
by seeing how many SDs it is above or
below the average.
– Ex. Convert “185cm” to standard units,
if this measurement comes from a
sample with mean 170cm and SD 10cm.
Histogram and the Normal
Curve
Converting back
• Find the height (in inches)
which is equal to -1.2 in standard units.
Solution: The histogram corresponds to a data
set with average 63.5 inches and standard
deviation 2.5 inches. So,
the height is 63.5 – 1.2x2.5 = 60.5 inches
because -1.2 is “1.2 standard
deviations” lower than the average 0.
Finding areas under the curve
• Example 1: Find the area under the
normal curve, when x is greater than
0

Answer: 50% !
Finding areas under the curve
• Example 2: Find the area between 0
and 1 under the normal curve
Finding areas under the curve
• Example 3:
Finding areas under the curve
• Example 4:
More Examples
More Examples
The Normal Approximation
for Data
• The heights of the men age 18-74 in
HANES averaged 69 inches; SD was
3 inches. Use the normal curve to
estimate the percentage of these
men with heights between 63 inches
and 72 inches.
Convert to standard units
and find the area
Graphically …
Remark!
• If the normal curve is a good fit for
data then we need only
– the mean, and
– the standard deviation
to find (approximately) the percentage
of data that falls in any given interval!
• If the histogram deviates from a normal
curve significantly, such approximations
• are far from reality in general.
Percentiles
• Mean and Standard Deviation are
less satisfactory if normal curve is
NOT a good approximation for data!
• Ex. Income survey in U.S., 1992. The
average was $44,500. However this
histogram does not look like normal.
It has a long right hand tail. The
proportion of the data below the
average is a) less b) equal c) more
• than 50% ?
Histogram of the example

• Answer? More, because median is


less than the mean (average)
What percent has negative
(!) income ?
• If the normal curve was a good
approximation, then

• However, NO income is negative!


Use of percentiles
• A percentile is a VALUE IN THE
DATA SET (it has the same data
units as the data set!)
• rth percentile is the value such that r
% of the data is below that value, and
(100-r)% is above that value.
Example of Percentile
• 143.7, 146.6, 148.0, 150.2, 151.0,
157.6, 164.0, 168.0, 171.4, 173.1,
174.0, 174.7, 175.0, 176.0, 179.0,
183.7, 185.0, 192.0, 192.0, 195.0
• Find 15th, 25th, 50th, 75th and 90th
percentiles.
• 15th percentileŞ 148.0 cm.
• 25th perc.= 151.0 cm
• 50thperc. (median) = 173.1 cm
Interquartile Range
• Another measure of spread:
75th percentile – 25th percentile
• Useful when the distribution is
skewed.
• In contrast, SD would be influenced
by a small percentage of cases in the
tail.
• Again, be careful with the
cases when normal appr. is not good!
Income Distribution: Percentiles
and Interquartile Range
r rth percentile
1 $1,300
10 $10,200 Interquartile Range:
$58,100 - $20,100
25 $20,100
= $38,000
50 $36,800
SD was $32,000.
75 $58,100
90 $85,000
99 $151,800
Percentiles and the Normal Curve
• SAT scores: Average 535, SD 100.
The scores follow a normal curve.
Estimate the 95th percentile of the
SAT score distribution.
Continued ..
Continued..
Change of Scale
• Find the average and SD of the list
1,3,4,5,7
Average = 4 SD = 2
• Take the above list, multiply by 3 and
then add 7. What are the new avg.
and SD?
Average= 19 (=3x4+7) SD=6 (=3x2)

You might also like