You are on page 1of 7

ENDATA130 Handouts 4B

Data Summarization / Description / Analysis


Measures of Variation / Dispersion

Relating Measures of Central Tendency and Measures of Variation:

Case 1: Data on scores of 5 students in two sections for the same set of questions

Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅
𝑋𝐴 = 40 ̅̅̅̅
𝑋𝐵 = 40

Location of the data points away from the center (mean):

_____________________ _____________________

Case 2: Data on Monthly Sales of Paint in cans for two brands A and B of paint (From Bluman)

Measures of Variation or Measures of Dispersion


- describes how the data points are scattered/spread or dispersed away from the center(mean).
- Indicates the degree of scattering of data points.
- There are several ways to represent the measures of variation or dispersion of the data, namely:
1.) Range; R
2.) Mean Deviation or Mean Absolute Deviation(MAD)
3.) Variance
4.) Standard deviation = commonly used
5.) Coefficient of Variation, Cvar

NOTE: The value of the measure of variation is directly proportional to the scattering of the rest of the data
points away from the center. That is, the lesser the value of the measure of variation = the lesser is the scattering,
and, the greater the value of the measure of variation = the wider or greater is the scattering away from the center.

Determination of the Range (R):


Range = the difference between the highest and lowest value in the data file.
- the simplest measure of variation/dispersion.

Case 1: For Raw Data


R =H–L

Case 2: For Grouped Data (Frequency Distribution)


R = Midpt of highest class – midpoint of lowest class
2
Example: Determination of the Range
Case 1: Range for Raw Data
Data on scores of 5 students in two sections for the same set of questions
Reqd: Range of the scores for the two sections

Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎

RA = H – L = 50 – 30 = 20 RB = H – L = 60 – 10 = 50

Case 2: Range for Grouped Data


Given:
Class M f
24 - 34 29 2
35 - 45 5
46 - 56 6
57 - 67 10
68 - 78 73 8
Reqd: Range
Soln:
R = Midpt of highest class – midpoint of lowest class = 73 – 29 = 44

Determination of the Mean Deviation or Mean Absolute Deviation (MAD):


Mean absolute deviation or mean deviation = the sum of the absolute values of the deviation of each
values in the data file with the mean divided by the number of values in the data file.
- the average of the absolute values of deviations of the scores around the mean.

Case 1: Using Ungrouped Data (Raw Data)


Steps: a.) Solve first for the mean of the raw data
b.) Get the deviation (d) for each value from the mean and the corresponding absolute deviation ldl.

Let: d = deviation of a value or data point from the mean


x = specific value or score in the data file

Solving for deviation, d, using Raw Data


d = individual score minus mean score
d = x - x̅

and absolute value of the deviation:


|𝑑| = |𝑥 − 𝑥̅ |

c.) Compute the mean absolute deviation as


∑|𝑑| ∑|𝑥− 𝑥̅ |
𝑀𝐴𝐷 = = where: n = Σf
𝑛 𝑛

Example:
Case 1: MAD for Raw Data
Data on scores of 5 students in two sections for the same set of questions
Reqd: MAD
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎

For Section A
x d = x - x̅ |𝑑| = |𝑥 − 𝑥̅ |
Hence: For Section A
30 -10 10 ∑|𝑑| ∑|𝑥 − 𝑥̅ |
35 -5 5 𝑀𝐴𝐷 = =
𝑛 𝑛
40 0 0 30
= =6
45 5 5 5
50 10 10
Σd = 0 𝚺|𝒅| = 𝟑𝟎
3
For Section B
x d = x - x̅ |𝑑| = |𝑥 − 𝑥̅ | Hence: For Section B
10 -30 30 ∑|𝑑| ∑|𝑥 − 𝑥̅ |
𝑀𝐴𝐷 = =
35 -5 5 𝑛 𝑛
40 0 0 70
= = 14
55 15 15 5
60 20 20
Σd = 0 𝚺|𝒅| = 𝟕𝟎

NOTE: The sum of deviations for all the values of x in the raw data is equal to zero. ∑ 𝑑 = 0

Case 2: Using Grouped Data (Frequency Distribution)


Steps: a.) Solve the mean (µ or x̅) using Midpoint Method.
b.) Get the deviation of each midpoint from the mean (d = M - µ = M - 𝑋̅ ).
c.) Multiply each deviation by the corresponding frequency (f d).
d.) Get the absolute value of |𝑓(𝑑)|
e.) Compute the MAD as follows:

∑|𝑓(𝑑)| ∑ 𝑓|(𝑑)|
𝑀𝐴𝐷 = ∑𝑓
= ∑𝑓

For Tabular Computations:

Classes f M (f)M (use to solve mean) d = M - 𝑋̅ |𝑓(𝑑)|

Example: Use MAD to answer this problem (From Bluman)

Given:

Class f M fM ̅
d= M - 𝑿 |𝒇 𝒅|
LB LU
62.5 73.5 5 68 340 -27.9 139.5
73.5 84.5 14 79 1106 -16.9 236.6
84.5 95.5 18 90 1620 -5.9 106.2
95.5 106.5 25 101 2525 5.1 127.5
106.5 117.5 12 112 1344 16.1 193.2
117.5 128.5 6 123 738 27.1 162.6
Σf = 80 Σf(M) = 7673 Σlf(d)l = 965.6

Solving for MAD:

∑|𝑓(𝑑)| ∑ 𝑓|(𝑑)| 965.6


𝑀𝐴𝐷 = ∑𝑓
= ∑𝑓
= = 12.1 Hence; lifetime of batteries are not consistent
80
4
Determination of the Variance σ 2
or s : 2

Variability can also be defined in terms of how close the scores in the distribution are to the middle
of the distribution. Using the mean as the measure of the middle or center of the distribution, the variance is
defined as the average squared difference of the scores from the mean(or squared deviation = d2). The variance
is the measure of dispersion that eliminates negative signs by squaring all deviations.

Standard Symbols for the Variance:


σ2 = variance for the population
s2 = variance for the sample

Case 1: Using Ungrouped Data (Raw Data)


Steps: a. ) Find the mean (µ or x̅).
b.) Determine the deviation of each score to the mean d = (X - µ) = X - x̅
c.) Solve for the square of the deviation, d2
d.) Calculate the variance for the population or the sample as follows:

∑ 𝑑2 ∑(𝑥− 𝜇)2 ∑ 𝑑2
𝜎2 = = = ∑𝑓
𝑁 𝑁

∑ 𝑑2 ∑(𝑥− 𝑥̅ )2 ∑ 𝑑2
𝑠2 = = = ∑ 𝑓−1
𝑛−1 𝑛−1

Where:
N or n = Σf

Illustration: Solving for Variance of Raw Data


Case 1: Variance for Raw Data
Data on scores of 5 students in two sections for the same set of questions
Reqd: variance = s2
Section A Section B
30 35 40 45 50 10 35 40 55 60
̅̅̅̅
𝑿𝑨 = 𝟒𝟎 ̅̅̅̅
𝑿𝑩 = 𝟒𝟎

For Section A
x d = x - 𝐱̅ d2
30 -10 100
35 -5 25
40 0 0
45 5 25
50 10 100

For Section B
x d = x - x̅ d2
10 -30 900
35 -5 25
40 0 0
55 15 225
60 20 400

Case 2: For Grouped Data (Frequency Distribution)


Steps: a.) Solve for the mean of the data file (𝑥̅ ) by Midpoint Method
b.) Get the midpoint of each class, M
c.) Compute for the deviation, d = M - 𝑥̅
d.) Solve for the square of the deviation, d2
e.) Multiply the frequency of each class with the squared deviation, fd2
f.) solve for the variance as follows:
5
σ = Σfd / N = Σfd / Σf
2 2 2

s2 = Σfd2 / n- 1 = Σfd2 / Σf - 1

For Tabular Computations:

Classes f M Mf (use to solve mean) d = M - 𝑥̅ d2 fd2

Illustration: Solving for variance using Grouped Data


Example: From Bluman
Example: Use variance to answer this problem (From Bluman)

Given:

Class F M fM ̅
d= M - 𝑿 d2 fd2
LB LU
62.5 73.5 5 778.41 3892.05
73.5 84.5 14 285.61 3998.54
84.5 95.5 18 34.81 626.58
95.5 106.5 25 26.01 650.25
106.5 117.5 12 259.21 3110.52
117.5 128.5 6 734.41 4406.46

Determination of the Standard Deviation:

Standard deviation (σ or s) = square root of the variance

σ = population standard deviation = √𝛔 𝟐

s = sample standard deviation = √𝒔𝟐

NOTE: In describing the center of the data distribution and the variation or scattering of the rest of the
data points away from the center, the Mean and Standard Deviation are used.

Determination of the Coefficient of Variation, Cvar:


Whenever two samples have the same units of measure, the variance and standard deviation for each can
be compared directly. A statistic or measure of variation that allows us to compare standard deviations when the
units are different, is called the coefficient of variation. Cvar is expressed in percentage value.

Cvar = standard deviation/ mean (100) = (σ/ μ) 100 = (s / 𝐱̅ ) 100


6
Example: From Bluman
Comparing Variation of Precipitation(inches of rain) and High Temperatures (oF).
The normal daily high temperatures (in degrees Fahrenheit) in January for 10 selected cities are as follows.
High Temp in oF = 50, 37, 29, 54, 30, 61, 47, 38, 34, 61

The normal monthly precipitation (in inches) for these same 10 cities is listed here.
Monthly Rainfall or Precipitation in inches = 4.8, 2.6, 1.5, 1.8, 1.8, 3.3, 5.1, 1.1, 1.8, 2.5

Which set is more variable? (Can be done using Coefficient of Variation)

Significance of the use of Standard Deviation with Chebychev’s Rule:

Chebychev’s Rule: (For Normal or Bell-Shaped Distribution)


The Empirical (Normal) Rule
Chebyshev’s theorem applies to any distribution regardless of its shape.
However, when a distribution is bell-shaped (or what is called normal), the following statements, which make up
the empirical rule, are true.
1.) Approximately 68% of the data values will fall within 1 standard deviation of the mean.
2.) Approximately 95% of the data values will fall within 2 standard deviations of the mean.
3.) Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.

NOTE: Standard deviation used as the distance or scattering away from the center (mean) applies to both the distance on the left side
and the right side of the mean.

Where: 𝑥̅ + 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the right of the mean)

And: 𝑥̅ − 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = (distance to the left of the mean)

Example:

Illustration: Use of Chebychev’s Rule and Standard Deviation


Example:
Given: Scores of 100 students in a 50-item test.
For a mean score of 𝑥̅ = 35, and standard deviation s = 3,
determine the following:

a.) scores and number of students within 1 standard deviation from the mean
Soln:
Scores within 1 std. dev =
Number of students = 68% (total no. of students) = 0.68(100) = 68 students

OR: There will 68 students with scores from 32 to 38 (No. of students within + 1s away from 𝑥̅ )
7
b.) scores and number of students within 2 standard deviations from the mean
Soln:
Scores within 2 std. dev =
Number of students = 95% (total no. of students) = 0.95(100) = 95 students

OR: There will 95 students with scores from 29 to 41 (No. of students within + 2s away from 𝑥̅ )

c.) scores and number of students within 3 standard deviations from the mean
Soln:
Scores within 3 std. dev =
Number of students = 99.7% (total no. of students) = 0.997(100) = 99.7 or 100 students

OR: There will 100 students with scores from 26 to 44 (No. of students within + 3s away from 𝑥̅ )

d.) the number of students with scores from 32 to 41


Soln:
For score of 32 =
For score of 41 =

Hence: number of students with scores within 32 to 41 = 34% + 47.5 % of the data points
= 81.5% (100 students) = 81.5 or 82 students

e.) the number of students with scores from 38 to 44


Soln:
For score of 38 =
For score of 41 =

Hence: number of students with scores within 38 to 41 = 47.5 % - 34% of the data points
= 13.5% (100 students) = 13.5 or 14 students

You might also like