Data Summarization - Measures of Variation

ENDATA130 Handouts 4B
Data Summarization / Description / Analysis

Measures of Variation / Dispersion
Relating Measures of Central Tendency and Measures of Variation:
Case 1: Data on scores of 5 students in two sections for the same set of questions
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅
𝑋𝐴 = 40 ̅̅̅̅
𝑋𝐵 = 40
Location of the data points away from the center (mean):
_____________________ _____________________
Case 2: Data on Monthly Sales of Paint in cans for two brands A and B of paint (From Bluman)
Measures of Variation or Measures of Dispersion

- describes how the data points are scattered/spread or dispersed away from the center(mean).
- Indicates the degree of scattering of data points.
- There are several ways to represent the measures of variation or dispersion of the data, namely:
1.) Range; R
2.) Mean Deviation or Mean Absolute Deviation(MAD)
3.) Variance
4.) Standard deviation = commonly used
5.) Coefficient of Variation, Cvar
NOTE: The value of the measure of variation is directly proportional to the scattering of the rest of the data
points away from the center. That is, the lesser the value of the measure of variation = the lesser is the scattering,
and, the greater the value of the measure of variation = the wider or greater is the scattering away from the center.
Determination of the Range (R):

Range = the difference between the highest and lowest value in the data file.
- the simplest measure of variation/dispersion.
Case 1: For Raw Data

R =H–L
Case 2: For Grouped Data (Frequency Distribution)

R = Midpt of highest class – midpoint of lowest class
2
Example: Determination of the Range
Case 1: Range for Raw Data
Data on scores of 5 students in two sections for the same set of questions
Reqd: Range of the scores for the two sections
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎
RA = H – L = 50 – 30 = 20 RB = H – L = 60 – 10 = 50
Case 2: Range for Grouped Data

Given:
Class M f
24 - 34 29 2
35 - 45 5
46 - 56 6
57 - 67 10
68 - 78 73 8
Reqd: Range
Soln:
R = Midpt of highest class – midpoint of lowest class = 73 – 29 = 44
Determination of the Mean Deviation or Mean Absolute Deviation (MAD):

Mean absolute deviation or mean deviation = the sum of the absolute values of the deviation of each
values in the data file with the mean divided by the number of values in the data file.
- the average of the absolute values of deviations of the scores around the mean.
Case 1: Using Ungrouped Data (Raw Data)

Steps: a.) Solve first for the mean of the raw data
b.) Get the deviation (d) for each value from the mean and the corresponding absolute deviation ldl.
Let: d = deviation of a value or data point from the mean

x = specific value or score in the data file
Solving for deviation, d, using Raw Data

d = individual score minus mean score
d = x - x̅
and absolute value of the deviation:

|𝑑| = |𝑥 − 𝑥̅ |
c.) Compute the mean absolute deviation as

∑|𝑑| ∑|𝑥− 𝑥̅ |
𝑀𝐴𝐷 = = where: n = Σf
𝑛 𝑛
Example:
Case 1: MAD for Raw Data
Reqd: MAD
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎
For Section A
x d = x - x̅ |𝑑| = |𝑥 − 𝑥̅ |
Hence: For Section A
30 -10 10 ∑|𝑑| ∑|𝑥 − 𝑥̅ |
35 -5 5 𝑀𝐴𝐷 = =
𝑛 𝑛
40 0 0 30
= =6
45 5 5 5
50 10 10
Σd = 0 𝚺|𝒅| = 𝟑𝟎
3
For Section B
x d = x - x̅ |𝑑| = |𝑥 − 𝑥̅ | Hence: For Section B
10 -30 30 ∑|𝑑| ∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 = =
35 -5 5 𝑛 𝑛
40 0 0 70
= = 14
55 15 15 5
60 20 20
Σd = 0 𝚺|𝒅| = 𝟕𝟎
NOTE: The sum of deviations for all the values of x in the raw data is equal to zero. ∑ 𝑑 = 0
Case 2: Using Grouped Data (Frequency Distribution)

Steps: a.) Solve the mean (µ or x̅) using Midpoint Method.
b.) Get the deviation of each midpoint from the mean (d = M - µ = M - 𝑋̅ ).
c.) Multiply each deviation by the corresponding frequency (f d).
d.) Get the absolute value of |𝑓(𝑑)|
e.) Compute the MAD as follows:
∑|𝑓(𝑑)| ∑ 𝑓|(𝑑)|
𝑀𝐴𝐷 = ∑𝑓
= ∑𝑓
For Tabular Computations:
Classes f M (f)M (use to solve mean) d = M - 𝑋̅ |𝑓(𝑑)|
Example: Use MAD to answer this problem (From Bluman)
Given:
Class f M fM ̅
d= M - 𝑿 |𝒇 𝒅|
LB LU
62.5 73.5 5 68 340 -27.9 139.5
73.5 84.5 14 79 1106 -16.9 236.6
84.5 95.5 18 90 1620 -5.9 106.2
95.5 106.5 25 101 2525 5.1 127.5
106.5 117.5 12 112 1344 16.1 193.2
117.5 128.5 6 123 738 27.1 162.6
Σf = 80 Σf(M) = 7673 Σlf(d)l = 965.6
Solving for MAD:
∑|𝑓(𝑑)| ∑ 𝑓|(𝑑)| 965.6

𝑀𝐴𝐷 = ∑𝑓
= ∑𝑓
= = 12.1 Hence; lifetime of batteries are not consistent
80
4
Determination of the Variance σ 2
or s : 2
Variability can also be defined in terms of how close the scores in the distribution are to the middle
of the distribution. Using the mean as the measure of the middle or center of the distribution, the variance is
defined as the average squared difference of the scores from the mean(or squared deviation = d2). The variance
is the measure of dispersion that eliminates negative signs by squaring all deviations.
Standard Symbols for the Variance:

σ2 = variance for the population
s2 = variance for the sample
Case 1: Using Ungrouped Data (Raw Data)

Steps: a. ) Find the mean (µ or x̅).
b.) Determine the deviation of each score to the mean d = (X - µ) = X - x̅
c.) Solve for the square of the deviation, d2
d.) Calculate the variance for the population or the sample as follows:
∑ 𝑑2 ∑(𝑥− 𝜇)2 ∑ 𝑑2
𝜎2 = = = ∑𝑓
𝑁 𝑁
∑ 𝑑2 ∑(𝑥− 𝑥̅ )2 ∑ 𝑑2
𝑠2 = = = ∑ 𝑓−1
𝑛−1 𝑛−1
Where:
N or n = Σf
Illustration: Solving for Variance of Raw Data

Case 1: Variance for Raw Data
Reqd: variance = s2
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎
For Section A
x d = x - 𝐱̅ d2
30 -10 100
35 -5 25
40 0 0
45 5 25
50 10 100
For Section B
x d = x - x̅ d2
10 -30 900
35 -5 25
40 0 0
55 15 225
60 20 400
Case 2: For Grouped Data (Frequency Distribution)

Steps: a.) Solve for the mean of the data file (𝑥̅ ) by Midpoint Method
b.) Get the midpoint of each class, M
c.) Compute for the deviation, d = M - 𝑥̅
d.) Solve for the square of the deviation, d2
e.) Multiply the frequency of each class with the squared deviation, fd2
f.) solve for the variance as follows:
5
σ = Σfd / N = Σfd / Σf
2 2 2
s2 = Σfd2 / n- 1 = Σfd2 / Σf - 1
For Tabular Computations:
Classes f M Mf (use to solve mean) d = M - 𝑥̅ d2 fd2
Illustration: Solving for variance using Grouped Data

Example: From Bluman
Example: Use variance to answer this problem (From Bluman)
Given:
Class F M fM ̅
d= M - 𝑿 d2 fd2
LB LU
62.5 73.5 5 778.41 3892.05
73.5 84.5 14 285.61 3998.54
84.5 95.5 18 34.81 626.58
95.5 106.5 25 26.01 650.25
106.5 117.5 12 259.21 3110.52
117.5 128.5 6 734.41 4406.46
Determination of the Standard Deviation:
Standard deviation (σ or s) = square root of the variance
σ = population standard deviation = √𝛔 𝟐
s = sample standard deviation = √𝒔𝟐
NOTE: In describing the center of the data distribution and the variation or scattering of the rest of the
data points away from the center, the Mean and Standard Deviation are used.
Determination of the Coefficient of Variation, Cvar:

Whenever two samples have the same units of measure, the variance and standard deviation for each can
be compared directly. A statistic or measure of variation that allows us to compare standard deviations when the
units are different, is called the coefficient of variation. Cvar is expressed in percentage value.
Cvar = standard deviation/ mean (100) = (σ/ μ) 100 = (s / 𝐱̅ ) 100

6
Example: From Bluman
Comparing Variation of Precipitation(inches of rain) and High Temperatures (oF).
The normal daily high temperatures (in degrees Fahrenheit) in January for 10 selected cities are as follows.
High Temp in oF = 50, 37, 29, 54, 30, 61, 47, 38, 34, 61
The normal monthly precipitation (in inches) for these same 10 cities is listed here.
Monthly Rainfall or Precipitation in inches = 4.8, 2.6, 1.5, 1.8, 1.8, 3.3, 5.1, 1.1, 1.8, 2.5
Which set is more variable? (Can be done using Coefficient of Variation)
Significance of the use of Standard Deviation with Chebychev’s Rule:
Chebychev’s Rule: (For Normal or Bell-Shaped Distribution)

The Empirical (Normal) Rule
Chebyshev’s theorem applies to any distribution regardless of its shape.
However, when a distribution is bell-shaped (or what is called normal), the following statements, which make up
the empirical rule, are true.
1.) Approximately 68% of the data values will fall within 1 standard deviation of the mean.
2.) Approximately 95% of the data values will fall within 2 standard deviations of the mean.
3.) Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.
NOTE: Standard deviation used as the distance or scattering away from the center (mean) applies to both the distance on the left side
and the right side of the mean.
Where: 𝑥̅ + 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the right of the mean)
And: 𝑥̅ − 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the left of the mean)
Example:
Illustration: Use of Chebychev’s Rule and Standard Deviation

Example:
Given: Scores of 100 students in a 50-item test.
For a mean score of 𝑥̅ = 35, and standard deviation s = 3,
determine the following:
a.) scores and number of students within 1 standard deviation from the mean
Soln:
Scores within 1 std. dev =
Number of students = 68% (total no. of students) = 0.68(100) = 68 students
OR: There will 68 students with scores from 32 to 38 (No. of students within + 1s away from 𝑥̅ )
7
b.) scores and number of students within 2 standard deviations from the mean
Soln:
Number of students = 95% (total no. of students) = 0.95(100) = 95 students
c.) scores and number of students within 3 standard deviations from the mean
Soln:
Number of students = 99.7% (total no. of students) = 0.997(100) = 99.7 or 100 students
d.) the number of students with scores from 32 to 41

Soln:
For score of 32 =
For score of 41 =
Hence: number of students with scores within 32 to 41 = 34% + 47.5 % of the data points
= 81.5% (100 students) = 81.5 or 82 students
e.) the number of students with scores from 38 to 44

Soln:
For score of 38 =
For score of 41 =
Hence: number of students with scores within 38 to 41 = 47.5 % - 34% of the data points
= 13.5% (100 students) = 13.5 or 14 students

Data Summarization - Measures of Variation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Summarization - Measures of Variation

Uploaded by

Copyright:

Available Formats

ENDATA130 Handouts 4B

Data Summarization / Description / Analysis

Relating Measures of Central Tendency and Measures of Variation:

Location of the data points away from the center (mean):

Measures of Variation or Measures of Dispersion

Determination of the Range (R):

Case 1: For Raw Data

Case 2: For Grouped Data (Frequency Distribution)

Case 2: Range for Grouped Data

Determination of the Mean Deviation or Mean Absolute Deviation (MAD):

Case 1: Using Ungrouped Data (Raw Data)

Let: d = deviation of a value or data point from the mean

Solving for deviation, d, using Raw Data

and absolute value of the deviation:

c.) Compute the mean absolute deviation as

Case 2: Using Grouped Data (Frequency Distribution)

For Tabular Computations:

Classes f M (f)M (use to solve mean) d = M - 𝑋̅ |𝑓(𝑑)|

Example: Use MAD to answer this problem (From Bluman)

Solving for MAD:

∑|𝑓(𝑑)| ∑ 𝑓|(𝑑)| 965.6

Standard Symbols for the Variance:

Case 1: Using Ungrouped Data (Raw Data)

Illustration: Solving for Variance of Raw Data

Case 2: For Grouped Data (Frequency Distribution)

For Tabular Computations:

Classes f M Mf (use to solve mean) d = M - 𝑥̅ d2 fd2

Illustration: Solving for variance using Grouped Data

Determination of the Standard Deviation:

Standard deviation (σ or s) = square root of the variance

σ = population standard deviation = √𝛔 𝟐

s = sample standard deviation = √𝒔𝟐

Determination of the Coefficient of Variation, Cvar:

Cvar = standard deviation/ mean (100) = (σ/ μ) 100 = (s / 𝐱̅ ) 100

Which set is more variable? (Can be done using Coefficient of Variation)

Significance of the use of Standard Deviation with Chebychev’s Rule:

Chebychev’s Rule: (For Normal or Bell-Shaped Distribution)

Where: 𝑥̅ + 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the right of the mean)

And: 𝑥̅ − 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the left of the mean)

Illustration: Use of Chebychev’s Rule and Standard Deviation

d.) the number of students with scores from 32 to 41

e.) the number of students with scores from 38 to 44

You might also like