You are on page 1of 23

1

Normal distribution
Distribution shape: Frequency distributions can assume many shapes
The three most important shapes are

1. positively skewed,
2. symmetric
3. negatively skewed

In a positively skewed or right-skewed distribution, the majority of the data values fall to the
left of the mean and cluster at the lower end of the distribution; the “tail” is to the right. Also, the
mean is to the right of the median, and the mode is to the left of the median. For example, if an
instructor gave an examination and most of the students did poorly, their scores would tend to
cluster on the left side of the distribution. A few high scores would constitute the tail of the
distribution, which would be on the right side. Another example of a positively skewed distribution
is the incomes of the population of the United States. Most of the incomes cluster about the low
end of the distribution; those with high incomes are in the minority and are in the tail at the right
of the distribution.

In a symmetric distribution, the data values are evenly distributed on both sides of the mean. In
addition, when the distribution is unimodal, the mean, median, and mode are the same and are at
the center of the distribution. Examples of symmetric distributions are IQ scores and heights of
adult males.

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
2

When the majority of the data values fall to the right of the mean and cluster at the upper end of
the distribution, with the tail to the left, the distribution is said to be negatively skewed or left-
skewed. Also, the mean is to the left of the median, and the mode is to the right of the median. As
an example, a negatively skewed distribution results if the majority of students score very high on
an instructor’s examination. These scores will tend to cluster to the right of the distribution.

What is normal?
Medical researchers have determined so-called normal intervals for a person’s blood pressure,
cholesterol, triglycerides, and the like. For example, the normal range of systolic blood pressure is
110 to 140. The normal interval for a person’s triglycerides is from 30 to 200 milligrams per
deciliter (mg/dl). By measuring these variables, a physician can determine if he study the people
person’s blood pressure, cholesterol, triglycerides he will found that maximum people will be fall
within the normal interval or if some type of treatment is needed to correct a condition and avoid
future illnesses. For example, if we study the age at menarche in Indian women, the age of the
menarche of the most of the women will fall between 13-15 years, very few women will fall
between 9-11 years (i.e. lower end of scale) and between 16-18 (higher end of scale) the frequency
distribution is also known as normal distribution or symmetric distribution i.e. normal curve (when
we make histogram of data).

Normal distribution: Normal distribution was first discovered by De-moivre in 1733. Carl
Freidrich Gauss and French mathematician Pierre Simon Laplace derived normal distribution in
1809. Normal distribution, also known as the Gaussian distribution, is a probability
distribution that is symmetric about the mean, showing that data near the mean are more frequent
in occurrence than data far from the mean. In graph form, normal distribution will appear as a bell
curve.

A normal distribution is a continuous, symmetric, bell-shaped distribution of a variable.

A random variable is normally distributed with mean µ and variance 𝜎 2 if its probability
distribution function is
1 𝑥−𝜇 2
1
P(X)= 𝑒 −2( 𝜎
)
for -∞ < 𝑥 < ∞
√2𝜋𝜎
where 𝜋 = 3.14 𝑎𝑛𝑑 e= 2.71

Why we use normal distribution?

 Used to illustrate the shape and variability of the data.


 Used to estimate future process performance.
 Normality is an important assumption when conducting statistical analysis

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
3

Properties of normal distribution: (most important question for exam)


1. A normal distribution curve is bell-shaped.
2. The mean, median, and mode are equal and are located at the center of the distribution.
3. A normal distribution curve is unimodal (i.e., it has only one mode).
4. The curve is symmetric about the mean, which is equivalent to saying that its shape is the
same on both sides of a vertical line passing through the center.
5. The curve is continuous; that is, there are no gaps or holes. For each value of X, there is a
corresponding value of Y.
6. The curve never touches the x axis. Theoretically, no matter how far in either direction the
curve extends, it never meets the x axis—but it gets increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or 100%. This fact may
seem unusual, since the curve never touches the x axis, but one can prove it mathematically
by using calculus.
8. The area under the part of a normal curve that lies within 1 standard deviation of the mean
is approximately 0.68, or 68%; within 2 standard deviations, about 0.95, or 95%; and within
3 standard deviations, about 0.998, or 99.8%. See Figure, which also shows the area in
each region.

9. Mean of normal distribution is 𝜇 and variance is 𝜎 2


10. Parameter space of normal distribution is -∞ < 𝜇 < ∞ and 0< 𝜎 < ∞.

The Standard Normal Distribution: Since each normally distributed variable has its own
mean and standard deviation, as stated earlier, the shape and location of these curves will vary.
In practical applications, then, you would have to have a table of areas under the curve for each
variable. To simplify this situation, statisticians use what is called the standard normal
distribution.

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
4

The standard normal distribution is a special case of the normal distribution . It is the
distribution that occurs when a normal random variable has a mean of zero and a
standard deviation of one.
The formula for the standard normal distribution is
𝑧2
1
P(z)= 𝑒− 2
√2𝜋

All normally distributed variables can be transformed into the standard normally distributed
variable by using the formula for the standard score.

𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛 𝑥−𝜇
Here z= 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑎𝑡𝑖𝑜𝑛 or z= 𝜎

Z is also called z value or z score.

Z score: The normal random variable of a standard normal distribution is called a standard
score or a z score. The location of any element in a normal distribution can be expressed in
term of how many standard deviations it lies above or below the mean of the distribution. The
is z score of element. If the element lies above the mean it will be have positive z score, if
element lies below the mean it will be have negative z score.

for example, the heart rate of 85 brats/min in a distribution shown in figure below lies 1.5
standard deviation above the mean so it has zero score of +1.5. A heart rate 65 beats/min lies
0.5 standard deviation below the mean so its z score is -0.5.

z score is calculated by formula


𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛
z= 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑎𝑡𝑖𝑜𝑛

when x=𝜇, than z=0

when x= 𝜇 + 1𝜎, than z=1

when x= 𝜇 + 2𝜎, than z=2


Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
5

when x= 𝜇 + 3𝜎, than z=3


similarly, if we take

when x= 𝜇 − 1𝜎, than z=-1

when x= 𝜇 − 2𝜎, than z=-2

when x= 𝜇 − 3𝜎, than z=3

Example: The National Center for Health Statistics at the CDC gives the following estimate
weight
of the body mass index ( height2) for 15 year old boys.

𝑥̅ =19.83

Suppose that a particular 15-year-old boy Fred has a BMI equal to 25


How overweight is Fred?
We know he is heavier than average for his age/gender group, but how much heavier?

Relative to the variability in BMI for 15-year-old boys in general, Fred BMI may be close to
the mean or far away.

Case 1: Suppose SD=10


This implies that the typical deviation from the mean is about 10 Fred’s deviation from the
mean is 5.17 so Fred doesn’t seem to be unusually heavy.

Case 2: Suppose SD=2


This implies that the typical deviation from the mean is about 2 Fred’s deviation from the mean
is 5.17 so Fred doesn’t seem to be unusually heavy.

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
6

Thus, the extremeness of Fred’s BMI is quantified by its distance from the mean BMI relative
to the SD of BMI.
The z score gives us this kind of information.
𝑣𝑎𝑙𝑢𝑒−𝑚𝑒𝑎𝑛 𝑥−𝜇
Here z= 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑎𝑡𝑖𝑜𝑛 or z= 𝜎

25−19.83
Case 1: z= =0.517 Fred’s BMI is 0.517 SD above the mean
10
25−19.83
Case 1: z= =2.585 Fred’s BMI is 2.585 SD above the mean
2

How extreme is a Z score of 2? 3? -1.5?

An exact answer to this question depends upon the distribution of the variable you are
interested in.

Finding Areas Under the Standard Normal Distribution Curve


For the solution of problems using the standard normal distribution, a two-step process is
recommended with the use of the Procedure Table shown. The two steps are
Step 1 Draw the normal distribution curve and shade the area.
Step 2 Find the appropriate figure in the Procedure Table and follow the directions given.

There are three basic types of problems, and all three are summarized in the Procedure Table.
Note that this table is presented as an aid in understanding how to use the standard normal
distribution table and in visualizing the problems. After learning the procedures, you should
not find it necessary to refer to the Procedure Table for every problem.

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
7

Example: draw the shapes hoe to find probabilities of z score

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
8

Solution:

For example, z value = 1.39

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
9

table in gives the area under the normal distribution curve to the left of any z value given in
two decimal places. For example, the area to the left of a z value of 1.39 is found by looking
up 1.3 in the left column and 0.09 in the top row. Where the two lines meet gives an area of
0.9177.

Example 1:

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
10

Example 2:

Example 3:

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
11

A Normal Distribution Curve as a Probability Distribution Curve:


A normal distribution curve can be used as a probability distribution curve for normally
distributed variables. Recall that a normal distribution is a continuous distribution, as opposed
to a discrete probability distribution. The fact that it is continuous means that there are no gaps
in the curve. In other words, for every z value on the x axis, there is a corresponding height, or
frequency, value. The area under the standard normal distribution curve can also be thought of
as a probability. That is, if it were possible to select any z value at random, the probability of
choosing one, say, between 0 and 2.00 would be the same as the area under the curve between
0 and 2.00. In this case, the area is 0.4772. Therefore, the probability of randomly selecting
any z value between 0 and 2.00 is 0.4772. The problems involving probability are solved in
the same manner as the previous examples involving areas in this section. For example, if the
problem is to find the probability of selecting a z value between 2.25 and 2.94, solve it by using
the method shown in case 3 of the Procedure Table. For probabilities, a special notation is
used. For example, if the problem is to find the probability of any z value between 0 and 2.32,
this probability is written as P (0 < z > 2.32).

Note: In a continuous distribution, the probability of any exact z value is 0 since the area would
be represented by a vertical line above the value. But vertical lines in theory have no area. So
P (a≤ z ≤b) = P (a< z< b)

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
12

Example:

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
13

Exercise:
find the probabilities for each, using the standard normal distribution

1. P (0 < z < 1.96) 0.4750


2. P (0 < z < 0.67) 0.2486
3. P (-1.23 < z < 0) 0.3907
4. P (-1.43 <z < 0) 0.4236
5. P (z > 0.82) 0.2061
6. P (z < 2.83) 0.0023
7. P (z < -1.77) 0.0384
8. P (z < -1.32) 0.0934
9. P (-0.20 <z < 1.56) 0.5199
10. P (-2.46 < z < 1.74) 0.9522
11. P (1.12 < z < 1.43) 0.0550
12. P (1.46 < z < 2.97) 0.0706
13. P (z > -1.43) 0.9236
14. P (z <1.42) 0.9222

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
14

Example:

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
15

Example:

Example:
The mean height of 500 medical students is 165cm and the standard deviation is 5cm assuming
that height is normally distributed find how many students will have height between 153 cm
and 180 cm?

Solution:
Given that

𝜇 = 𝑚𝑒𝑎𝑛 = 165𝑐𝑚 & 𝜎 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 5𝑐𝑚


𝑥−𝜇
Find Z value for 153 & 180 with formula z=
𝜎
Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
16

153−165
Z= =-2.4
5
180−165
Z= =3
5

So
P(153<X<180) = P(153<X<180) =P(0<Z<-2.4) +P(0<Z<3)
By using table D
=0.4918+0.4987
=0.9905
Hence the required number of students whose heights are in between 153cm and
180cm= 0.9905×500= 450 (approximately).

Example:
A hospital records the weight of every new born child at the hospital. The distribution of weight

Of weight is normally shaped, has mean, µ=2.9kg, and has a standard deviation, 𝜎 = 2.5kg
Find the

1. The percentage of new born who weighted under 2.1kg.


2. The percentage of new born who weighted between 1.8kg and 4kg.
3. If 1500 babies have been born at hospital, how many weighted less than 2.5kg.

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
17

Solution:
Given that
µ=2.9kg

𝜎 = 2.5kg

1. To find the percentage of new born who weighted under 2.1 kg first we covert 2.1 kg to
z score as
𝑥−𝜇 2.1−2.9
z= = = -1.78
𝜎 0.45

From table F
P(z<-1.78) = 0.0375

Hence, the percentage of new born who weighed under 2.1 kg= 0.0375×100=3.75%.

2. To the percentage of new born who weighted between 1.8kg and 4kg firstly we transfer
the values into z score.
𝑥−𝜇 1.8−2.9
z= = = -2.4
𝜎 0.45
𝑥−𝜇 4−2.9
z= = = 2.4
𝜎 0.45

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
18

so
P(-2.4<z<2.4) =P(0<z<-2.4) + P(0<z<2.4)
From area table E
=0.4918+0.4918=0.9836
Hence the required percentage of new born between weight 1.8 kg and 4 kg is
0.9836×100= 98.4%.
3. To find the, how many weighted less than 2.5kg
𝑥−𝜇 2.5−2.9
z= = = -2.9
𝜎 0.45

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
19

From area table F


P(z<-0.89) = 0.1867

Thus the required no of babies who weighed less than 2.5 kg = 0.1867×1500=280.

Example: Assume that the age at onset of disease X is distributed normally with the mean of 50
years and standard deviation of 12 years what is the probability that individual afflicted with X
had developed it before age 35 years?
Solution: Given that
µ=50 years

𝜎 = 12 years

Now Z corresponding to X=35 years is


𝑥−𝜇 35−50
z= = = -1.25
𝜎 12

by using area table F


P(x<35) =P(z<-1.25) =0.1056

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
20

Exercise:
Q1. Pulse rate of healthy male adults follow normal distribution with a mean of 75 per minute
and standard deviation of 4 per minute. Find out the percentage of individuals having pulse rate
beyond 85 per minute?

Q2. Systolic B.P reading of a large male population is normally distributed with mean 100 and
standard deviation 15. What is the 90th percentile of systolic B.P reading?
Q3. The average annual salary for all U.S. teachers is $47,750. Assume that the distribution is
normal and the standard deviation is $5680. Find the probability that a randomly selected teacher
earns.
a. Between $35,000 and $45,000 a year 0.3031
b. More than $40,000 a year 0.9131
c. If you were applying for a teaching position and were offered $31,000 a year, how would you
feel (based on this information)?
Q4. The average daily jail population in the United States is 706,242. If the distribution is
normal and the standard deviation is 52,145, find the probability that on a randomly selected day,
the jail population is
a. Greater than 750,000
b. Between 600,000 and 700,000
Q5. The average number of calories in a 1.5-ounce chocolate bar is 225. Suppose that the
distribution of calories is approximately normal with s 10. Find the probability that a randomly
selected chocolate bar will have
a. Between 200 and 220 calories
b. Less than 200 calories
Q6. The average monthly mortgage payment including principal and interest is $982 in the
United States. If the standard deviation is approximately $180 and the mortgage payments are
approximately normally distributed, find the probability that a randomly selected monthly
payment is
a. More than $1000
b. More than $1475
c. Between $800 and $1150
note: all question solved with diagram of normal distribution

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
21

Table E

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
22

Table D

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com
23

Table F

Qaisar Sohail
M PHIL statistics
Email: qaisr.gcuf@gmail.com

You might also like