You are on page 1of 25

Normal Distribution

NORMAL DISTRIBUTION
The normal distribution
was first discovered by
the French
mathematician Abraham
De Moivre .
Abraham De Moivre
(1667 - 1754)
NORMAL DISTRIBUTION (CONTD.)
• Normal Distribution(Contd.)
In studies of public health, information is frequently collected for variables that can be measured
on a continuous scale in nature. Examples of such variables include age, weight, and blood
pressure. The normal distribution is the most widely used distribution to describe continuous
variables. It is also frequently referred to as the Gaussian distribution, after the well-known
German mathematician Karl Friedrich Gauss (1777–1855).
• Normal distributions are a family of distributions characterized by the same general shape. These
distributions are symmetrical, with the measured values of the variable more concentrated in the
middle than in the tails. The area under the curve of a normal distribution represents the sum of
the probabilities of obtaining every possible value for a variable.. The shape of the normal
distribution represents specified mathematically in terms of only two parameters: the mean (µ),
and the standard deviation [sigma]
• Several biological variables are normally distributed (e.g., blood pressure, serum cholesterol,
height, and weight). The normal curve can be used to estimate probabilities associated with these
variables.
• Since the normal distribution can have an infinite number of possible values for its mean and
standard deviation, it is impossible to calculate the area for each and every curve. Instead,
probabilities are calculated for a single curve where the mean is zero and the standard deviation is
one. This curve is referred to as a standard normal distribution (Z). A random variable (X) that is
normally distributed with mean (µ) and standard deviation [sigma] can be easily transformed to
the standard normal distribution by the formula Z = (X−µ)/[sigma].
• The normal distribution is important to statistical work because most hypothesis tests that are
used assume that the random variable being considered has an underlying normal distribution.
Fortunately, these tests work very well even if the distribution of the variable is only approximately
normal. Examples of such tests include those based on the t, F, or chi-square statistics. If the
variable is not normal, alternative nonparametric tests should be considered; Alternatively,
mathematical theory (e.g., the central limit theorem) has proven that normal distribution–based
Normal Distribution(Contd.)
• Here is a list of situations where approximate normality is sometimes assumed.
• In counting problems (so the central limit theorem includes a discrete-to-continuum approximation)
where reproductive random variables are involved, such as
– Binomial random variables, associated to yes/no questions;
– Poisson random variables, associated to rare events;
• In physiological measurements of biological specimens:
– The logarithm of measures of size of living tissue (length, height, skin area, weight);
– The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of
growth; presumably the thickness of tree bark also falls under this category;
– Other physiological measures may be normally distributed, but there is no reason to expect that a
priori;
• Financial variables
– Changes in the logarithm of exchange rates, price indices, and stock market indices; these variables
behave like compound interest, not like simple interest, and so are multiplicative;
– Other financial variables may be normally distributed, but there is no reason to expect that a priori;
• Light intensity
– The intensity of laser light is normally distributed;
– Thermal light has a normal distribution on longer timescales due to the central limit theorem.
Normal Distribution
• Mean µ defines the center of the curve
• Standard deviation  defines the spread
Normal Distribution
Curves of Normal Distribution are;
•Bell shaped
•Not too steep, not too flat
•Have two parameters
1.Mean
2.Standard Deviation
NORMAL DISTRIBUTION(CONTD.)
• Examples of normal distributions are shown
below. Notice that they differ in how spread
out they are. The area under each curve is the
same. The height of a normal distribution can
be specified mathematically in terms of two
parameters: the mean (μ) and the
standard deviation (σ).
Rules for any Curve of Normal
Distribution
• 68% of the observations fall within 1 standard
deviation of the mean (µ  )
• 95% of the observations fall within
µ  2
• 99.7% of the observations fall within
µ  3
Rule for Normal curve

68% 95%
- µ + -2 µ +2
99.7%

-3 µ +3
Rule
Normal Distribution(Contd.)

• If a test is normally distributed with a mean of 60 and a


standard deviation of 10, what proportion of the scores is above 85? This
problem is very similar to figuring out the percentile rank of a person
scoring 85. The first step is to figure out the proportion of scores less than
or equal to 85. This is done by figuring out how many standard deviations
above the mean 85 is. Since 85 is 85-60 = 25 points above the mean and
since the standard deviation is 10, a score of 85 is 25/10 = 2.5 standard
deviations above the mean. Or, in terms of the formula,
= (85-60)/10 = 2.5
A z table can be used to calculate that 0.9938 of the scores are less than
or equal to a score 2.5 standard deviations above the mean. It follows that
only 1-0.9938 = .0062 of the scores are above a score 2.5 standard
deviations above the mean. Therefore, only 0.0062 of the scores are
above 85.
STANDARD NORMAL DISTRIBUTION
• The standard normal distribution is a normal distribution with a mean of 0 and a standard
deviation of 1. Normal distributions can be transformed to standard normal distributions
by the formula:

where X is a score from the original normal distribution, m is the mean of the original
normal distribution, and s is the standard deviation of original normal distribution. The
standard normal distribution is sometimes called the z distribution. A z score always
reflects the number of standard deviations above or below the mean a particular score is.
For instance, if a person scored a 70 on a test with a mean of 50 and a standard deviation
of 10, then they scored 2 standard deviations above the mean. Converting the test scores
to z scores, an X of 70 would be:

So, a z score of 2 means the original score was 2 standard deviations above the mean. Note
that the z distribution will only be a normal distribution if the original distribution (X) is
normal.

Characteristics of Normal distribution

• Frequency analysis of data reveals bell curve


• Most values near the middle datum or
average of the sample
• Very few values near the upper and lower
extremes
• Data fit the formula of a normal distribution
Contd.
• Deviations from Normal Distribution measure
using the pth moment of the mean
• Symmetry and skewness - deviation from
normal distribution using the third moment of
the mean
• skewed to the left (negative skew)
• skewed to the right (positive skew)
• measure of skewness
Contd.
• Kurtosis - evaluates distance of values near
and away from the mean from using the
fourth moment of the mean
• platykurtic - flat distribution where most of
the values are near the mean
• leptokurtic - peaked distribution with most of
the values are towards the tails
• measure of kurtosis
Normal & Assymetrical Distribution
Contd.
• Using Z scores, probability and proportions of the normal distribution can be
known.
• Samples of data can tell us something about the population a sample came from.
• Samples can be used to make inferences about the frequency distribution of
different data in the population (middle values, extreme upper and lower values)
• Samples can be used to make inferences about the measures of central tendency
and variation in the population.
• Sample distributions and measures can be used to make inferences about
probability distributions for a population which in turn can be used in statistical
testing, gives us an idea about what is happening in the population, can be used to
compare populations in statistical testing .
• Method to calculate Z scores - tell us how many standard deviations a value of Xi is
from the population mean or how far a sample mean is from the population mean
• There are probabilities, p values, associated with every Z score in a standardized
normal distribution, commonly found in tables in the Appendix of statistics texts.
Contd.
• Special Properties of a Normal Curve - link between Z scores and
probability
• Conversion of raw scores into Z scores results in a conversion of the
raw score distribution into a standard normal distribution with a
mean = 0 and standard deviation of 1.( Frequency distribution of Z
scores converted from raw scores.)
• total area under the normal curve is a unit of one or 100%
• the normal curve is symmetric about its population mean, 50% of
the distribution's area lies in the positive tail of the curve and 50%
lies in the negative tail
• 68.27% of all values lie within 1 SD of the population mean
• 95.45 % of all values lie within 2 SD of the population mean
• 99.00% of all values lie within 3 SD of the population mean
Contd.
• Statistical Errors - Type I and Type II
• There are two types of statistical errors that can be made.
• Type I - reject a null hypothesis when the null hypothesis is true,
use a p value to express the degree of confidence you have in not
making this mistake (p<0.05 is commonly used) .
• Type II - do not reject a null hypothesis when it is false, uses beta to
express the power of the test and its results
• Reporting variability around the mean - several methods found in
the literature
• Mean + SD
• Mean + SE
• Mean + 95% confidence intervals
Normal Distribution (Contd.)
• Central Limit Theorem
• The distribution of means of random samples
drawn from a non-normal population will tend to
be normally distributed, especially with
increasing sample size
• Primary use of the normal distribution in
statistics
• used to see if sample and population data are
normally distributed, this is a requirement or
assumption for most statistical tests involving
means
Normal Distribution
• Determining Sample Size in Research, Using Z scores and the
Normal Distribution
• Affected by three factors
• 1) Confidence level and p value you choose - standard is 95%
confidence level and p value of 0.05, if you insist on larger
confidence levels (smaller p values) then you must have larger
samples
• 2) difference between means you can live with - how different do
you want the means to be, to be considered different (related to
the measuring device and accuracy), if you insist on small
differences between means then you must have larger samples
• 3) population standard deviation, if the population standard
deviation is large, then the sample sizes must be higher, if the
population standard deviation is small, then it will be easier to
detect minor differences with smaller samples
Normal Distribution
• The location and scale parameters of the normal
distribution can be estimated with the sample mean and
sample standard deviation, respectively. For both
theoretical and practical reasons, the normal distribution is
probably the most important distribution in statistics. For
example, Many classical statistical tests are based on the
assumption that the data follow a normal distribution. This
assumption should be tested before applying these tests.
• In modeling applications, such as linear and non-linear
regression, the error term is often assumed to follow a
normal distribution with fixed location and scale.
• The normal distribution is used to find significance levels in
many hypothesis tests and confidence intervals
Normal Distribution
• Importance of the normal distribution
• The normal distribution is one which appears in a
variety of statistical applications. One reason for this is
the central limit theorem. This theorem tells us that
sums of random variables are approximately normally
distributed if the number of observations is large. For
example, if we toss a coin, the total number of heads
approaches normality if we toss the coin a lot of times.
Even when a distribution may not be exactly normal, it
may still be convenient to assume that a normal
distribution is a good approximation. In this case, many
statistical procedures, such as the t-test can still be
used

You might also like