You are on page 1of 6

Statistics

MODULE 16: Areas Under a Normal Distribution

LEARNING OUTCOMES

At the end of the module, you are expected to exhibit the following competencies:
1. Recognise the concepts of normal curve.
2. Compute probabilities using a table of cumulative areas under a standard normal curve.
3. Compute percentiles of a normal curve.

IMPORTANT CONCEPTS

Introduction: Review of Normal Distribution

• A normal distribution has a symmetric bell-shaped curve (for its probability density function) with one peak.
This curve is characterized by its mean m (the center of symmetry, and also the peak) and standard deviation σ
(the distance from the center to the change-of-curvature points on either side). If a random variable X has a
normal distribution with mean m and variance σ2 , we denote this as X~N(μ,σ2).
• A normal curve is symmetric about its mean (thus the mean is the median). It is more concentrated in the
middle and its peak is at the mean (so that the mean is also the mode).
• Like any continuous distribution, the total area under the normal curve is equal to 1, and the probability that a
normal random variable X equals any particular value a, P(X=a) is zero (0).
• The normal curve follows the empirical rule (also called the 68-95-99.7 rule):
o About 68% of the area under the curve falls within 1 standard deviation of the mean.
o About 95% of the area under the curve falls within 2 standard deviations of the mean.
o Nearly the entire distribution, about 99.7% of the area under the curve, falls within 3 standard deviations
of the mean.

Probabilities/Areas Under a Normal Curve

Given a normally distributed random variable: X~N(μ,σ2 ), we often wish to find various probabilities pertaining
to where an arbitrary measurement may lie. For instance, we may want to find P(a ≤ X ≤ b), which is the
probability that a random measurement X lies between a and b.

We may also wish to find the proportion of measurements less than a value k (or at most k ), denoted by P(X < k)
(or P(X ≤ k) ). Remind learners that it would not matter whether we are considering P(X < k) or P(X ≤ k) since or
P(X = k) =0

Module 16 Page 1 of 6
Areas Under a Normal Distribution

Finally, we may want the proportion greater than k (or at least


k), denoted by P(X > k) (or P(X ≥ k) ).

In the last module, you were given a lesson on the standard normal distribution. We make use of areas under a
standard normal distribution also but we need to convert a normal distribution into standardized form.

Standard Scores (or Z-scores

Whatever the value of the mean and standard deviation of a normal curve, we can transform the whole normal
curve into a standard normal curve (as illustrated in the following figure).

This entails transforming the all data in a normal curve into standard units:

An observation is in standard unit (or z-score) if we see how many standard deviations it is above or below the
average. That is, if x, m, and s respectively represent the observation, its mean, its standard deviation, then the
standardized form (or z-score) of x is

Module 16 Page 2 of 6
Areas Under a Normal Distribution

A Z-score indicates how many standard deviations a certain data element is from the mean. For instance, if
examination scores in Statistics and Probability have an average of 75 and a standard deviation of 5, then an
exam score of 90 has a z-score of (90-75)/5 = 3 , while a score of 70 has a z-score of (70-75)/5 =-1. To interpret
these z-scores, we note that 90 is 3 standard deviations above the mean (75), while 70 is one standard deviation
“below’ the mean.

Z scores have a very good way of making variables comparable. Suppose a student got an examination score of 90
in Statistics and Probability (where the mean was 75 and the standard deviation was 5) and a 92 in an English
examination (where the mean was 95 and the standard deviation was 3). While it might seem that the Statistics
and Probability is “lower” (in absolute numbers) than the English, but the z-score in English is (92-95)/3 = -1, so
the “relative” performance in English (in relation to the average) is actually lower than the relative performance
in Statistics and Probability.

The Z-scores may also be used for normal random variables to transform them into standard normal random
variables, and this, in turn, can help us relate probabilities for any normal distribution to areas under a standard
normal curve, as the following example on the time to walk a dog illustrates.

Illustration for Finding Areas Under a Normal Curve

Assume that the distribution of heights of all female Grade 11 students can be modeled well by a normal curve
with a mean of 1620 mm and a standard deviation of 50 mm. Further, we wish to determine (a) the proportion of
female Grade 11 students shorter than 1550 mm; (b) the proportion of female Grade 11 students taller than 1650
mm; (c) the proportion of female Grade 11 students between 1600 and 1675 mm; (d) the height of a female
Grade 11 student for which 10 percent of female Grade 11 students are shorter than it; (e) the height of a female
Grade 11 student for which 75% of female Grade 11 students are taller than it.

For computing the answer to (a), tell learners to firstly transform 1550 to its z-score, yielding (1550-1620)/50 =-
1.4 so that we can associate the area to the left of 1550 (under a normal curve with mean 1620 and standard
deviation 50) with that of the area to the left of z = -1.4 under a standard normal curve. Reading from the table of
Cumulative Distribution Function of a Standard Normal Curve, we find Φ(-1.4) = 0.0808,

For (b), ask learners what they should do. They should say they need to firstly transform the height value 1650 to
its standard units, (1650-1620)/50 = 0.6, and then note that the area to the right of z = 0.6 under the standard
normal curve is the difference between the total area under a standard normal curve (100%) and the area to the
right of z=0.6, Φ (0.6)= 0.7257. In consequence, the desired probability (and area) is 1- 0.7257=0.2743.

Likewise, for (c), learners should mention they need to firstly transform 1600 and 1675 into their respective
standardized forms, namely (1600-1620)/50 = -0.4 and (1675- 1620)/50 = 1.1, and then generate the area
between these two z-scores as the difference between Φ (1.1) and Φ (-0.4), i.e. 0.8643-0.3446=0.5197.

For (d), draw the figures on the board to illustrate what needs to be done:

Module 16 Page 3 of 6
Areas Under a Normal Distribution

The 10th percentile of the height distribution may be obtained by firstly getting the 10th percentile of the
standard normal curve, which can be read off from the table as –1.282. This means that the 10th percentile of the
height distribution is 1.282 standard deviations below the mean. This required value for the height is –
1.282(50)+1620 =1555.9.

Finally, for (e), we want the 25th percentile as this is the value for which 75 percent of the height distribution
would be above it. Similar to (d), you can find the 25th percentile first of a standard normal curve (– 0675), then
yield the required height as:

–0.675(50)+1620 =1586.25.

Computing with Excel

In the last module, the NORMSDIST function was illustrated in the Enrichment part of the lesson. There are other
important Excel functions for the normal distribution, especially the NORMDIST and NORMINV functions.

The NORMDIST (x, mu, sigm, cumulative) helps obtain cumulative probabilities but for general normal curves. The
parameters x, mu and sigma are numeric values, where the parameter, cumulative is a logical TRUE or FALSE
value. Note that sigma must be greater than 0 (as it is a non-trivial standard deviation), but there are no similar
requirements whether for x or mu.

To illustrate, recall the female student’s height example, where we were interested firstly in obtaining P(X ≤ 1550
given m =1620 and s =50) where X is the height of a randomly selected female Grade 11 student. Students can
merely use the NORMDIST function that asks for the score (1550), the mean (m =1620) and standard deviation (s
=50) of the normal distribution:

= NORMDIST(1550,1620,50,TRUE)

Note that the final argument TRUE tells Excel that we wish to obtain the area to the left (rather than the height of
the normal curve).

Also, to compute P(X ≥ 1650 given m =1620 and s =50), you can specify in Microsoft Excel the command

= 1-NORMDIST(1650,1620,50,TRUE)

For P(1600 ≤ X ≤ 1675 given m =1620 and s =50), learners can enter

Module 16 Page 4 of 6
Areas Under a Normal Distribution

= NORMDIST(1675,1620,50,1) - NORMDIST(1600,1620,50,1)

The NORMINV (p, mu, sigma) function of Excel returns the value x such that, with probability p, a normal random
variable with mean mu and standard deviation sigma takes on a value less than or equal to x. That is, the value
returned is the (100 times p)th percentile of the normal curve with mean mu and standard deviation sigma.

For instance, to obtain the 10th percentile of the distribution for the heights of female Grade 11 students, merely
enter in Excel

=NORMINV(0.1,1620,50)

The 25th percentile (the value for which 75 percent are above it) can be obtained with:

=NORMINV(0.25,1620,50)

PRACTICE SKILLS

1. Rodrigo earned a score of 940 on a national achievement test. The mean test score was 850 with a standard
deviation of 100. What proportion of students had a higher score than Rodrigo? (Assume that test scores are
normally distributed.) If there were 100,000 students who took the test, how many would be expected to
have a higher score than Rodrigo?

2. Every night when you get home from school, you take your dog Bantay for a walk. The length of the walk is
normally distributed with a mean of m=15 minutes and standard deviation of s=3 minutes.

(a) What proportion of walks last less than 15 minutes?


(b) What proportion of walks last longer than 20 minutes?
(c) What proportion of walks last between 10 and 16 minutes?

3. Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard deviation
of 10, what is the probability that a person who takes the test will score between 90 and 110?

4. The following letter appeared in the popular “Dear Abby” newspaper advice column in the 1970s:

Dear Abby: You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my
baby for ten months and five days, and there is no doubt about it because I know the exact date my baby
was conceived. My husband is in the Navy and it couldn’t have possibly been conceived any other time
because I saw him only once for an hour, and I didn’t see him again until the day before the baby was
born.

I don’t drink or run around, and there is no way this baby isn’t his, so please print a retraction about the
266-day carrying time because otherwise I am in a lot of trouble. - San Diego Reader

Module 16 Page 5 of 6
Areas Under a Normal Distribution
The advice column was founded in 1956 by Pauline Phillips under the pen name "Abigail Van Buren" and
carried on up to today by her daughter, Jeanne Phillips, who now owns the legal rights to the pseudonym.

Suppose that according to pediatricians, pregnancy durations, let’s call them X, tend to be normally
distributed with m= 266 days and s = 16 days. Perform a probability calculation that addresses San Diego
Reader’s credibility, presuming she was pregnant for 308 days. What would you conclude and why?

REFERENCES

Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia
Marquez). Philippines: Rex Bookstore.

De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.

Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna 4031

Probability and statistics: Module 22. (2013). Australian Mathematical Sciences Institute and Education Services
Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/PDF/ExpoNormDist4f.pdf

Module 16 Page 6 of 6

You might also like