Professional Documents
Culture Documents
The data sets may have the same mean, but when we look at their graphs, the data sets look
different from each other because of their variability.
1. Measures of Dispersion
Measure of central tendency give us good information about the scores in our distribution.
However, we can have very different shapes to our distribution, yet have the same central
tendency.
The measures of dispersion or measures of variation show how observations in a data set
vary from the mean. It will give us information about the spread of the scores in our distribution.
Are the scores clustered close together over a small portion of the scale, or are the scores
spread out over a large segment of the scale?
A. Range. The range is the difference between the high and low score in a distribution. Simply
subtract the two numbers to find the range. So, in the distribution: 1, 3, 5, 9, 11 the range is
11 – 1 = 10. Remember to subtract the two numbers to give one number for the final answer.
However, the range does not use the concept of deviation. It is affected by outliers but does not
consider all values in the data set.
Example: Find the range of the numbers of ounces (oz) dispensed by Machine 1 and Machine
2.
Machine 1
R= 10.07 – Machine 1 Machine 2 5.85
9.52 8.01 R= 4.22 oz
6.41 7.99
10.07 7.95 Machine 2
5.85 8.03 R= 8.03-7.95
8.15 8.02 R= 0.08 oz
X= 4.22 oz X= 0.08 oz Range of a set of
negative numbers
If your set includes negative numbers, the range will still be positive because subtracting a
negative is the same as adding.
When dealing with range, imagine the numbers on the number line. The range is simply the
space between the two extreme values.
B. Variance
Population variance
When you have collected data from every member of the population that you’re interested in,
you can get an exact value for population variance.
Sample variance
When you collect data from a sample, the sample variance is used to make estimates or
inferences about the population variance.
NOTE: Be careful in identifying what kind of data your dealing with in the given problem, is it
population or sample data?
2 = ∑ (− ) 2
2 =∑ (̅− ̅ )2
σ s
� �−1
Example: Find the variance of the following sample data: 46, 69, 32, 60, 52, 41
�−1
Step 1: Find the mean
To find the mean, add up all the scores, then divide them by the number of scores/observation.
Mean (x̅ ) = ∑ � x̅ = (46 + 69 + 32 + 60 + 52 + 41) = 300 = 50 Where did you get the 6?
n 6 6
or
Sum of squares or
or or FINAL ANSWER
�� = ���. �
s2 = s2 =
Statistical tests such as variance tests or the analysis of variance (ANOVA) use sample variance
to assess group differences of populations. They use the variances of the samples to assess
whether the populations they come from significantly differ from each other.
C. Standard Deviation
The standard deviation is the average amount of variability in your dataset. It tells you, on
average, how far each value lies from the mean.
A high standard deviation means that values are generally far from the mean, while a low
standard deviation indicates that values are clustered close to the mean.
Standard deviation is a useful measure of spread for normal distributions.
In normal distributions, data is symmetrically distributed with no skew. Most values cluster
around a central region, with values tapering off as they go further away from the center. The
standard deviation tells you how spread out from the center of the distribution your data is on
average.
Many scientific variables follow normal distributions, including height, standardized test scores,
or job satisfaction ratings. When you have the standard deviations of different samples, you can
compare their distributions using statistical tests to make inferences about the larger
populations they came from.
Example: Comparing different standard deviations. You collect data on job satisfaction ratings
from three groups of employees using simple random sampling.
The mean (M) ratings are the same for each group – it’s the value on the x-axis when the curve
is at its peak. However, their standard deviations (SD) differ from each other. The standard
deviation reflects the dispersion of the distribution. The curve with the lowest standard deviation
has a high peak and a small spread, while the curve with the highest standard deviation is more
flat and widespread.
� ̅̅
�
�−1
Mean (x̅ ) = ∑ � x̅ = (46 + 69 + 32 + 60 + 52 + 41) = 300 = 50 n Where did you get the 6?
6 6
49 49 – 50 = 19
32 32 – 50 = -18
60 60 – 50 = 10
52 52 – 50 = 2
41 41 – 50 = -9
or
or
Sum of squares
16 + 361 + 324 + 100 + 4 + 81 = 886 or
or or or FINAL ANSWER
� � = ��. ��
� �
From learning that SD = 13.31, we can say that each score deviates from the mean by 13.31
points on average.
Although there are simpler ways to calculate variability, the standard deviation formula weighs
unevenly spread out samples more than evenly spread samples. A higher standard deviation
tells you that the distribution is not only more spread out, but also more unevenly spread out.
D. Co-efficient variation
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard
deviation to the mean (average). For example, the expression “The standard deviation is 15% of
the mean” is a CV.
The CV is particularly useful when you want to compare results from two different surveys or
tests that have different measures or values. For example, if you are comparing the results from
two tests that have different scoring mechanisms. If sample A has a CV of 12% and sample B
has a CV of 25%, you would say that sample B has more variation, relative to its mean.
Step 1: Calculate the mean value of the data set in the first step.
FINAL ANSWER
Mean (x̅ ) = � x̅ = (60.25 + 62.38 + 65.32 + 61.41 + 63.23) = 312.59 (x̅ ) = 62.51
∑ n 5 5
Step 2: Calculate the standard deviation for the same values by placing values in the above SD formula.
then,
then,
√∑ (5.11) + (0.017) + (7.90) + (1.21) +
(0.52) =
5−1
then, or or or = √3.68 FINAL ANSWER
√14.72 √14.72 � � = �. ��
� �
Step 3: Calculate the coefficient of variance after getting mean and SD.
��
�� = �100%
Mean (x̅)
��= 0.31� 100% FINAL ANSWER
�� = 100% �� = �. �
NOTE: If the value of the coefficient of variation is less than 10, it is perceived as very good
values.
CV between 10 and 20 is also good value, but if this value gets greater than 30, it is not
acceptable.
The coefficient of variation has great importance when it comes to the variation in a data set. The
coefficient of variance is important because the normal standard deviation must also be
interpreted in light of the mean value. The real value of the CV is not dependent on the unit in
which measurements are taken in comparison. The coefficient of variance can be used instead
of the SD for comparison between data sets of varying units.
2. Measures of Relative Standing
Measures of relative standing can be used to compare values from different data sets, or to
compare values within the same data set.
Empirical Rule
The Empirical Rule is a statement about normal distributions. Your textbook uses an abbreviated
form of this, known as the 95% Rule, because 95% is the most commonly used interval. The
95% Rule states that approximately 95% of observations fall within two standard deviations of
the mean on a normal distribution.
On a normal distribution about 68% of data will be within one standard deviation of the
mean, about 95% will be within two standard deviations of the mean, and about 99.7% will
be within three standard deviations of the mean.
−� ̅ ̅
−
�= �=
� �
�
� = original data value � = original data value
� = mean of the original distribution ̅ = mean of the original distribution �
� = standard deviation of the original = standard deviation of the original
distribution distribution
z-distribution
A bell-shaped distribution with a mean of 0 and standard deviation of 1, also known as the
standard normal distribution.
Example: Milk
A study of 66,831 dairy cows found that the mean milk yield was 12.5 kg per milking with a
standard deviation of 4.3 kg per milking (data from Berry, et al., 2013).
̅ ̅
− � = �. ���
�= �= �=
�
Interpretation: This cow’s z-score is 1.302; her milk production was 1.302 standard deviations
above the mean.
̅ ̅
− �=�
�= �= �=
�
Interpretation: This cow’s z-score is 0; her milk production was the same as the mean.
̅ ̅
− � = −�. ���
�= �= �=
�
Interpretation: This cow’s z-score is -1.047; her milk production was 1.047 standard
deviations below the mean.
We say that a distribution is symmetric if it can be folded along a vertical axis so that the two
sides of the graph coincide. Below is an example of histogram that show symmetric
distributions.
If a distribution lacks symmetry with respect to a vertical axis, the distribution is said to be
asymmetric or skewed.
Left-Skewed Right-Skewed
Image from: https://blog.minitab.com/hubfs/Imported_Blog_Media/skewedhistograms.jpg
Skewed to the right or positively skewed distribution has a right tail longer than the left tail. A
positively skewed distribution indicates that he mean is greater than the median of the data set.
On the other hand a distribution with the left tail longer than the right tail is called negatively
skewed or skewed to the left. A negatively skewed distribution indicates that the mean is
less than the median of the data set.