You are on page 1of 36

Introduction

to Statistics
Semester-V Fall 2021

Ms. Fatima Salman


Psychology Department
Lahore Garrison University
Variability
The Range
Standard Deviation and Variance for a Population
Standard Deviation and Variance for Samples
• Variability provides a quantitative measure of the differences between
scores in a distribution and describes the degree to which the scores
are spread out or clustered together.

• Notice that the distributions differ in terms of variability.


• For example, most heights are clustered close together, within 5 or 6 inches of the mean. On
the other hand, weights are spread over a much wider range.
• The purpose for measuring variability is to obtain an objective measure of how the scores are
spread out in a distribution.
1. Range
• The range is the distance covered by the scores in a distribution, from the
smallest score to the largest score. When the scores are measurements of a
continuous variable, the range can be defined as the difference between the
upper real limit (URL) for the largest score (Xmax) and the lower real limit (LRL)
for the smallest score (Xmin).
Range = URL for Xmax – LRL for Xmin
• If the scores have values from 1 to 5, for example, the range is 5.5 – 0.5 = 5
points.
• Because the range does not consider all of the scores in the distribution, it
often does not give an accurate description of the variability for the entire
distribution.
• For this reason, the range is considered to be a crude and unreliable measure
of variability.
Standard Deviation and Variance
for a population
• The standard deviation is the most commonly used and the
most important measure of variability.
• In simple terms, the standard deviation
• provides a measure of the standard distance from the
mean and
• describes whether the scores are clustered closely around
the mean or are widely scattered.
• Deviation is distance from the mean: deviation score = X – µ
• Note that the deviation scores add up to zero. This should not be surprising if you remember
that the mean serves as a balance point for the distribution.
• The total of the distances above the mean is exactly equal to the total of the distances below
the mean. Thus, the total for the positive deviations is exactly equal to the total for the
negative deviations, and the complete set of deviations always adds up to zero.
• Because the sum of the deviations is always zero, the mean of the deviations is also zero and
is of no value as a measure of variability.
• The mean of the deviations is zero if the scores are closely clustered and it is zero if the scores
are widely scattered.
• The average of the deviation scores does not work as a measure of variability because it is
always zero. Clearly, this problem results from the positive and negative values canceling each
other out.
• The solution is to get rid of the signs (+ and –). The standard procedure for accomplishing this
is to square each deviation score.

• Using the squared values, you then compute the mean squared deviation, which is called
variance.
• Population variance equals the mean squared deviation. Variance is
the average squared distance from the mean.
LEARNING
ANSWERS:
Formulas for Population Variance and Standard
Deviation
LEARNING CHECK
STANDARD DEVIATION AND
VARIANCE FOR SAMPLES
• Sample variability tends to underestimate population variability
unless some correction or adjustment is made.
• The effect of the adjustment is to increase the value that you obtain.
Dividing by a smaller number (n – 1 instead of n) produces a larger
result and makes sample variance an accurate and unbiased
estimator of population variance.
• DEGREE OF FREEDOM
• Sample variability tends to underestimate population variability
unless some correction or adjustment is made.
• The effect of the adjustment is to increase the value that you obtain.
Dividing by a smaller number (n – 1 instead of n) produces a larger
result and makes sample variance an accurate and unbiased
estimator of population variance.
LEARNING CHECK
ANSWERS
• Remember that the formulas for sample variance and standard deviation were
constructed so that the sample variability would provide a good estimate of
population variability. For this reason, the sample variance is often called estimated
population
• Variance, and the sample standard deviation is called estimated population standard
deviation.
• Because standard deviation requires extensive calculations, there is a tendency to get
lost in the arithmetic and forget what standard deviation is and why it is important.
• Standard deviation is primarily a descriptive measure; it describes how variable, or
how spread out, the scores are in a distribution. Behavioral scientists must deal with
the variability that comes from studying people and animals. People are not all the
same; they have different attitudes, opinions, talents, IQs, and personalities. Although
we can calculate the average value for any of these variables, it is equally important
to describe the variability. Standard deviation describes variability by measuring
distance from the mean
Features of Standard Deviation

• 1. In frequency distribution graphs, we identify the position of the


mean by drawing a vertical line and labeling it with µ or M. Because the
standard deviation measures distance from the mean, it is represented
by a line or an arrow drawn from the mean outward for a distance
equal to the standard deviation and labeled with a õ or an s.
• We show a population
distribution with a mean of µ =
80 and a standard deviation of õ
= 8, and a sample distribution
with a mean of M = 16 and a
standard deviation of s = 2.
• In the population distribution, a
score that is 4 points above the
mean is slightly above average
but is certainly not an extreme
value because it is only half of
the standard deviation.
• In the sample distribution,
however, a score that is 4
points above the mean is an
extremely high score. In each
case, the relative position of
the score depends on the size
of the standard deviation.
• For the sample, on the other
hand, a 4-point deviation is
very large, twice the size of
the standard deviation.
• 2. Describing an entire distribution
• Rather than listing all of the individual scores in a distribution, research reports typically
summarize the data by reporting only the mean and the standard deviation. When you
are given these two descriptive statistics, however, you should be able to visualize the
entire set of data.
• For example, consider a sample with a mean of M = 36 and a standard deviation of s = 4.
• For this sample, the data can be pictured as a pile of boxes (scores) with the center of the
pile located at a value of M = 36. The individual scores, or boxes, are scattered on both
sides of the mean with some of the boxes relatively close to the mean and some farther
away.
• As a rule of thumb, roughly 70% of the scores in a distribution are located within a
distance of one standard deviation from the mean, and almost all of the scores (roughly
95%) are within two standard deviations of the mean.
• In this example, the standard distance from the mean is s = 4 points, so your image
should have most of the boxes within 4 points of the mean, and nearly all of the boxes
within 8 points.
3. Describing the location of individual
scores
• The mean and the standard deviation helps us to complete the
picture of the entire distribution and relate each individual score to
the rest of the group and tells us the exact location of the individual
score
• For example, from the previous image, we can guess the location of X
= 34 i.e. it is located near the center of the distribution, only slightly
below the mean.
• On the other hand, a score of X = 45 is an extremely high score,
located far out in the right-hand tail of the distribution.
Variability
Variability plays an important role in the inferential statistics because
the variability in the data influences how easy it is to see patterns.
In general, low variability means that existing patterns can be seen
clearly, whereas high variability tends to obscure any patterns that
might exist.
EXAMPLE
• In most research studies the goal is to compare means for two (or more) sets of
data.
For example:
• Is the mean level of depression lower after therapy than it was before therapy?
• In each of these situations, the goal is to find a clear difference between two
means that would demonstrate a significant, meaningful pattern in the results.
Variability plays an important role in determining whether a clear pattern exists.
• Consider the following data representing hypothetical results from two
experiments, each comparing two treatment conditions. For both experiments,
your task is to determine whether there appears to be any consistent difference
between the scores in treatment 1 and the scores in treatment 2.
Learning Check

You might also like