Normal Distribution: Density Curve

You might also like

You are on page 1of 17

Chapter 5

Density Curve
Normal Distribution
For continuous data we have seen the need to use grouped
data histograms. Another tool we can use is the construction of
a density curve, a theoretical curve that best matches the areas
of the histogram.
The curve super-imposed over the histogram shape is a
possible best fit representation.
Rather than reading off the density at an individual score, we
consider the area below the curve for a score interval.
Example: Suppose we wish to know the proportion of data
points that lie between x-values of x=-2 and x=-0.5. This area
contains three bars. The sum of their total area (width height)
yields proportion of data for the interval.

Bar Width Height Area
- X < -.
0.5 0.08 0.04
-. X < -
0.5 0.14 0.07
- X < -.
0.5 0.24 0.13


The total proportion of data points X, such that -2<x<-0.5, is the
sum of these areas: 0.24 (or 24%)




Density Curves
Recall that a random variable is continuous if it can take on any
value in a given (possibly infinite) interval.

A probability density function f is a real valued function that
describes the distribution of probability for a continuous
random variable and must satisfy:
() for all .
The area under the curve = () and above - axis is 1.

Point (ii) is conveniently written using notation from integral
calculus:
] J(x)dx = 1

-
.
Further convention insists that we no longer say the area
bounded above by f(x) and below by the x-axis. Instead we will
simply say the area under the curve f(x).

If f is the probability density function for a continuous random
variable , then the area under the probability density curve
= () that lies above = and between = o and
= b represents the probability that a random value of lies
between o and b.



_J(x)dx = P( x

)


Note: Since the area enclosed by a vertical line segment is 0,
we have

P( x ) = P( < b) = P( x < b)
= P( < < b).


Measures for Probability Density Functions

The mean, variance, standard deviation, median, and
percentiles can also be defined for continuous random
variables.

The mean is the balance point of the density curve. So if the
curve were a solid object the mean would be the point that you


could balance the curve on your finger. The formula for the
mean of a density curve = () is

(X) = X

= _ X(X)X

-
.


The standard deviation of a continuous random variable
measures the square root of the distance of any value of to
the mean, taking into account the distribution of these values.
The formula for the variance and standard deviation for a
density curve are

s
2
= _(X -X

)
2
(X)X

-

And

s = Vs
2
.


The median of a density curve is the real number H for which
% of the area under = () lies to the left of the line
= H. Thus, the median H satisfies the equation




Similarly, the percentile
number for which
left of ; that is









percentile of a density curve is the real
of the area under

.




of a density curve is the real
lies to the




The Normal Distribution

The Normal Distribution, used for inferential statistics and
advanced probability, is the most important probability
distribution in statistics. The density function depends on the
mean p and standard deviation o and is given as

(X) =

oVn

-(X-)
2
2c
2


Where n and are the familiar mathematical constants
n =3.141592 . . .and = .88 . . ..

The probabilities (o b) under a normal distribution
are usually estimated numerically using a z-table. We will
always use a z-table to compute these probabilities for normal
distributions.

Notation: The normal distribution with mean p and standard
deviation o is denoted by (p, o). If X is a continuous random
variable with the normal distribution (p, o), then we write
X (p, o).








Properties of the Normal Curve (p, o)


(i) The normal curve is bell-shaped and symmetric about the
line X = p.
(ii) The area under the normal curve is one.
(iii) The maximum value of the normal curve occurs at the point
(p,
1
cV2n
).
(iv) The normal curve has two inflection points at (p + o,
1
cV2n
)








(v) Fixing and changing

(vi) The normal curve approaches the
however, the curve never reaches the
y = 0 is a horizontal asymptote of the normal curve.)

(vii) Fixing and changing
and longer/shorter tails






and changing results in a horizontal shift.
The normal curve approaches the x-axis on either side of,
the curve never reaches the x-axis. (So we say that
asymptote of the normal curve.)
and changing results in different max heights
tails.
results in a horizontal shift.
axis on either side of,
axis. (So we say that

results in different max heights




Calculating Probabilities for Standard
Normal Distributions

The standard normal distribution is the normal curve with
parameters p = and o = .
A continuous random variable with the standard normal
distribution is usually denoted by Z.

Example 5.1: The fact that the standard normal is symmetric
about zero implies that % of the area falls on either side of
zero or ( ) = ..
Thus, zero is the medain of the standard normal distribution.





Note: z- tables give the area under the curve for each positive
z, as seen in the following graph.











Example 5.2: ( .) is equal to the area under the
curve between zero (0) and 0.55, and that area is provided for
us on the table. That is, ( .) = .88.




Example 5.3: (-. ). This probability is equal to
the area under the graph between -1.65 and zero (0):





This area is obtained from the table: Area = 0.4505. Therefore,
(-. Z ) = . -(Z -.) = ..

Example 5.4: ( b) where b > . For example to calculate
(Z .9). The area that is equal to this probability is the
shaded area of the following graph:







(Z .9) = .889.

Example 5.5: (o ) where o < . For example to
calculate (-.8 Z ). The area that is equal to this
probability is the shaded area of the following graph:




(Z -.8) = (Z .8) = .98.

Example 5.6: (o < Z < b), where o < and b > . For
example, (-.9 Z .). The area that is equal to
this probability is the shaded area of the following graph:





We follow the same procedure as we used in last examples, by
finding the area of the shaded region to the left of the y-axis
and the area of the shaded area to the right of the y-axis, and
adding the areas to get the total area, which gives us the
desired probability



(-.9 Z .)
= (Z . ) - ( Z -.9)
= . - .8 = ..


Example 5.7: (o < Z < b), where o > and b > . For
example, (. Z .). The area that is equal to this
probability is the shaded area of the following graph:








(. Z .) = ( Z .) - ( Z . )
= .99 - .89 = .8.


Probability Calculations with the Normal Distribution

If X(p, o), then the standardized variable =
X-
c
has the
standard normal distribution. Thus,

(o X b) = (
o -p
o

b - p
o
)

Where (, ).

Example 5.8: Suppose (,). Find (8 ).

Start by standardizing X. So let =
X-30
2
. Then



(8 ) = _
8 -

]
= (- .) = .8.

Example 5.9: The air pressure in a randomly selected tire is
normally distributed with mean 31 psi and standard deviation
0.2 psi. What is the probability that the pressure of a randomly
selected tire is

(a) more than 30.5 psi?
(b) between 30.5 and 31.5?


Let X = the air pressure of a given tire. Then X (, .), and
hence

=
x-31
0.2
(,).

(a) (X .) =
30.5-31
0.2
= ( -.)
= ( .) = .998

So the probability that the pressure of a randomly selected tire
exceeds 30.5 psi is .9938.

(b) (. X .) =
30.5-31
0.2

31.5-31
0.2

= (-. .) = ( .)
= .98 = .98





Percentiles of the normal distribution

Example 5.9: Suppose certain test scores are normally
distributed with mean 300 and standard deviation 45.

(a) What percentage of people scored below 350?

(b) Find the 25th percentile for the scores.

Let X = the score on a given test. Then
=
X -

(,).
a) (X ) =
350-300
45
= ( .)
= .8.

That is to say that 86.65% of the people scored below 350.
Hence 350 is approximately the 86th percentile.

(a) We need to find c such that
. = (X c) = _
c -

]

From the table we see that
c-300
45
= . which
implies that
c-300
45
= -.. Thus, -. =
c-300
45
or
c = 9.8. Hence the 25 th percentile is 9.8.

You might also like