Professional Documents
Culture Documents
Measures of Variation
1
Measures of Variation
An important characteristic of any set of data
is the variation in the data. In some data sets,
the data values are concentrated closely near
the mean; in other data sets, the data values
are more widely spread out from the mean.
How Can We Measure Variability?
Range
Variance
Standard Deviation
2
Measures of Variation: Range
The range is the difference between the
highest and lowest values in a data set.
3
Example 1: Outdoor Paint
Two experimental brands of outdoor paint are
tested to see how long each will last before
fading. Six cans of each brand constitute a
small population. The results (in months) are
shown. Find the mean and range of each group.
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
4
Example 3-18/19: Outdoor Paint
Brand A Brand B X 210
35
10 35 Brand A: N 6
60 45 R 60 10 50
50 30
30 35
X
210
35
40 40 Brand B: N 6
20 25
R 45 25 20
6
Measures of Variation:
Variance & Standard Deviation
(Population Theoretical Model)
The population variance is
X
2
2
N
The population standard deviation is
X
2
N
7
Example 2: Outdoor Paint
Find the variance and standard deviation for the data set
for Brand A and B in the previous example.
Solution: For brand A, 35
X
2
2
Months, X X - µ (X - µ)2 N
1750
10 -25 625
60 25 625 6
50 15 225 291.7 months 2
30 -5 25
40 1750
5 25
20 -15 225 6
1750 17.1months
8
Example 3-21: Outdoor Paint
Solution: For brand B, 35
X
2
Months, X X - µ (X - µ)2
2
35 0 0 N
45 10 100 250
30 -5 25
6
35 0 0
40 5 25 41.7 months 2
25 -10 100
250 41.7 6.5 months
Since the standard deviation of brand A is 17.1 and the
standard deviation of brand B is 6.5, the data are
more variable for brand A.
9
Measures of Variation:
Variance & Standard Deviation
(Sample Theoretical Model)
The sample variance is
X X
2
s 2
n 1
The sample standard deviation is
X X
2
s
n 1
10
Measures of Variation:
Variance & Standard Deviation
(Sample Computational Model)
The sample variance is
n X X
2 2
s
2
n n 1
11
Measures of Variation:
Variance & Standard Deviation
(Sample Computational Model)
Is mathematically equivalent to the
theoretical formula.
Saves time when calculating by hand
Does not use the mean
Is more accurate when the mean has
been rounded.
12
Example 3: European Auto Sales
Find the variance and standard deviation for the
amount of European auto sales for a sample of 6
years. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
n X X
2 2
X X 2
s
2
11.2 125.44 n n 1
11.9 141.61
6 958.94 75.6
12.0 144.00 2
12.8 163.84 s
2 s 2 1.28
13.4 179.56 6 5 s 1.13
14.3 204.49
75.6 958.94
s 2 6 958.94 75.62 / 6 5
13
Measures of Variation for Grouped
Data
Sample variance and standard deviation
are given by the formula.
s
2
n f .X 2
m f .X
m
2
,n f
n n 1
s s 2
14
Measures of Variation for Grouped
Data
Example: Find the variance and the standard deviation for
the following frequency distribution.
2
Class f Xm f. Xm f .X m
5.5 - 10.5 1 8 8 64
10.5 - 15.5 2 13 26 338
15.5 - 20.5 3 18 54 972
20.5 - 25.5 5 23 115 2645
25.5 - 30.5 4 28 112 3136
30.5 - 35.5 3 33 99 3267
35.5 - 40.5 2 38 76 2888
f = 20 f ·Xm = 490 m 13310
f . X
2
15
Measures of Variation for Grouped
Data
Solution: n 20, f ·X = 490,
m m 13310
f . X
2
20 13310 490
2
s
2
20 19
266200 240100 26100
s
2
68.68
380 380
s 68.68 8.29
16
Applications:
1-Coefficient of Variation
The coefficient of variation is the
standard deviation divided by the
mean, expressed as a percentage.
s
CV 100%
X
Used CV to compare standard
deviations when the units are different.
17
Example : Sales of Automobiles
The mean of the number of sales of cars over a
3-month period is 87, and the standard
deviation is 5. The mean of the commissions is
$5225, and the standard deviation is $773.
Compare the variations of the two.
s 5
For Sales CVar 100% 100% 5.7%
X 87
s 773
For commissions CVar 100% 100% 14.8%
X 5225
A 132 23
B 182 62
s 23 4.8
For group A: CV ar 100% 100% 100% 3.6%
X 132 132
s 62 7.9
For group B: CV ar 100% 100% 100% 4.3%
X 182 182
Group A is less variable than group B.
19
Measures of Variation:
2-Range Rule of Thumb
The Range Rule of Thumb
approximates the standard deviation
as Range
s
4
when the distribution is unimodal and
approximately symmetric.
20
Measures of Variation:
Range Rule of Thumb
Use X 2s to approximate the lowest
value and X 2s to approximate the
highest value in a data set.
Example: X 10, Range 12
12 LOW 10 2 3 4
s 3
4 HIGH 10 2 3 16
21
3-3 Measures of Position
Z-score
Percentile
Quartile
Outlier
22
Measures of Position: Z-score
A z-score or standard score for a value
is obtained by subtracting the mean from
the value and dividing the result by the
standard deviation.
X X X
z z
s
A z-score represents the number of
standard deviations a value is above or
below the mean.
23
Example : Test Scores
A student scored 65 on a calculus test that had a
mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and
a standard deviation of 5. Compare her relative
positions on the two tests.
X X 65 50
z 1.5 Calculus
s 10
X X 30 25
z 1.0 History
s 5
She has a higher relative position in the Calculus class.
24
Example : Students GPA
Two students, Ahmed and Ali, from different high
schools, wanted to find out who had the highest GPA
when compared to his school. Which student had the
highest GPA when compared to his school?
Student GPA School Mean GPA School Standard Deviation GPA
Ahmed 3 0.7
2.85
Ali 77 80 10
25
Example : Students GPA
Solution:
X X 2.85 3
For Ahmed: z 0.21
s 0.7
X X 77 80
For Ali: z 0.3
s 10
Then, Ahmed has the better GPA.
26
Measures of Position: Percentiles
Percentiles are divide a set of data into 100
equal groups with about 1% of the values
in each group . There are 99 percentiles
denoted P1, P2, . . . P99,
27
Example : Test Scores
A teacher gives a 20-point test to 10 students.
Find the percentile rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Sort in ascending order.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
6 values
Percentile
number of values below X 0.5
100
total number of values
6 0.5 A student whose score
100
10 was 12 did better than
65 65% of the class.
28
Converting from the kth
Percentile to the Corresponding
Data Value
29
30
Example : Test Scores
A teacher gives a 20-point test to 10 students. Find
the value corresponding to the 25th percentile and
60th percentile
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Solution: Sort in ascending order.
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
For the 25th percentile
K 25
L .n .10 2.5 3
100 100
P25 5
31
Example 3-34: Test Scores
2, 3, 5, 6, 8, 10, 12, 15, 18, 20
For 60th percentile
K 60
L .n .10 6
100 100
10 12
P60 11
2
What is the different between a percentile and a
percentage?
A percentile is a relative measurement of position; a percentage is
an absolute measure of the part to the total.
32
Deciles:
Deciles are divide a set of data into 10
equal groups with about 10% of the values
in each group. There are 9 Deciles denoted
D 1, D 2, . . . D 9,
Not that:
D1=P10, D2=P20, D3=P30, ….. , D9=P90,
33
QUARTILES:
Quartiles separate the data set into 4 equal groups with about
25 % of the values in each group. There are 3
quartiles denoted by Q1, Q2, Q3
34
Example:
Find Q1, Q2, and Q3 for the data set.
15, 13, 6, 5, 12, 50, 22, 18
6 12
Q1 9
2
13 15
Q2 median 14
2
18 22
Q3 20
2
35
Measures of Position:
Outliers
An outlier is an extremely high or low
data value when compared with the rest of
the data values.
A data value less than Q1 – 1.5(IQR) or
greater than Q3 + 1.5(IQR) can be
considered an outlier.That is the data
value outside the interval
Q1 – 1.5(IQR), Q3 + 1.5(IQR
Exercise: Find the outlier for the previous
example
36
The Five-Number Summary
37
Boxplots
38
Constructing Boxplots
1. Find the five-number summary.
2. Draw a horizontal axis with a scale that includes
the maximum and minimum data values.
3. Draw a box with vertical sides through Q1 and
Q3, and draw a vertical line though the median.
4. Draw a line from the minimum data value to the
left side of the box and a line from the maximum
data value to the right side of the box.
39
Example 3-38: Meteorites
The number of meteorites found in 10 U.S. states is
shown. Find the five number summary and construct
a boxplot for the data.
89, 47, 164, 296, 30, 215, 138, 78, 48, 39
Solution:
40
The five-number summary are
Lowest value=30, Q1 =47, MD=83.5, Q3 =164
Highest value=296
47 83.5 164
30 296
41
Exercise:
Consider the following boxplot,
1- Identify the five number summary?
2- What is the outlier?
3- Describe the distribution?
42