You are on page 1of 14

Statistical Analysis for IE: MIDTERM

MEASURES OF VARIATION
The measures of variation or dispersion indicate the degree or extent to which numerical values are dispersed
or spread out about the average value in a distribution. In this chapter, we will discuss the more properly used
measures of variation. These are the range, the semi quartile range, the quartile range, the mean deviation or
average deviation, the variance and the standard deviation.
RANGE
The range which is the simplest to compute, is the difference between the largest and the lowest values in the
set of numerical data. The range for ungrouped data is obtained by finding the difference between the largest
value and the lowest value. For grouped data, the range is determined by subtracting the lower boundary of
the lowest class interval from the upper boundary of the highest class interval of a frequency distribution. This
is so because the class boundaries are considered the true limits.
Thus, we have the following formulas:
For ungrouped Data
Range (R) = Highest value (HV) – Lowest Value (LV) or
R = HV – LV
For Grouped Data
Range (R) = Upper Boundary of the highest class interval (UBHCI) – Lower Boundary of lowest class interval
(LBLCI)
R = UBHCI - LBLCI
EXAMPLE: The scores obtained by 12 students in Statistics class are 80,75,63,95,98,78,85,90,73,68,87 and 81.
Find the range.

Solution: The highest score is 98, while the lowest score is 63. Hence,
R = HV – LV
R = 98 – 63
R = 35

EXAMPLE: Find the range of a given frequency distribution whose highest class interval is 91-95 and lowest
class interval is 51-55

Solution: The upper boundary of the highest class interval 91-95 is 95.5 and the lower boundary of the lowest
class interval 51-55 is 50.5. Therefore, the range is obtained, as follows.
R = UBHCI - LBLCI
R = 95.5 – 50.5
R = 45

THE INTER-QUARTILE RANGE


We learned in the preceding chapter that the quartile divides the distribution of numerical data into 4
equal parts. The first or lower quartile lies on the 25th % of the total number of values, while the third or the
upper quartile is on the 75th%
The inter-quartile (IQR) is found by finding the difference between the values of the third quartile (Q 3)
or upper quartile and the first quartile (Q1), or the lower quartile.
In symbols, we have: IQR = Q3 – Q1

THE SEMI-QUARTILE RANGE OR QUARTILE DEVIATION


The semi-interquartile range (SIQR) or quartile deviation (QD) indicates the variation or dispersion of
the values covering the middle 50% of the distribution of the data. It is found by getting half of the value or
distance between the third quartile or upper quartile and the first quartile or the lower quartile. Thus, we have
the equation:
Q3−Q1
SIQR or QD =
2
Note that the range, the inter quartile range, and the semi-interquartile range have the same disadvantage.
Each does not provide idea on the density of observations, and hence, gives only little information on the
concentration of the observations about the central values.
The semi-interquartile range or quartile deviation is an appropriate measure of variation only if the median is
the one that is used as the measure of central tendency, and especially if the distribution is skewed.
EXAMPLE: A manufacturing company produced the following number of units per day for a given period:
21,25,20,28,30,23,22,31,32,27,19,33,24,29,26,34
Determine:
a. Range
b. Interquartile range (IQR)
c. Semi-Interquartile range (SIQR) or Quartile Deviation (QD)
Solutions:
a. Range = HV – LV
R = 34 – 19
R = 15
To solve IQR and SIQR or QD, Let us first arrange the production units according to magnitude and
calculate Q1 AND Q3, as shown below.
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Q1 Q2 Q3

Note that Q1 is up to the point containing the lower 25% of the data arranged from lowest to highest,
while Q3 is up to the point containing the upper 75% of the data.
Thus, we apply the formulas as follows:
b. IQR = Q3 – Q1
IQR = 30.5 – 22.5
IQR = 8

Q3−Q1
c. SIQR or QD =
2

30.5−22.5
QD =
2
SIQR or QD = 4

EXAMPLE: Table 1 below shows the average production of 60 employees of a manufacturing company during a
given week.
Find the:
a. Range
b. Inter-quartile range
c. Semi-interquartile range or Quartile Deviation

Average Production Number of Employees <cf


(Classes) (f)
61-65 4 4
66-70 8 12
71-75 10 22
76-80 16 38
81-85 11 49
86-90 7 56
91-95 4 60
Solutions:
a. Range (R)
R = UBHCI - LBLCI
R = 95.5 – 60.5
R = 35

b. Interquartile Range (IQR)


First, we have to calculate the first and the third quartiles. Substituting the values to the formulas we
studied in the preceding chapter, we have:

60
Q1 = 70.5 + 5 4
( )
−12
10

Q1 = 72

3
Q3 = 80.5 + 5 4
(
(60)−12
10 )
Q3 = 83.68
Therefore, IQR = Q3 – Q1
IQR = 83.68 – 72
IQR = 11.68

c. Semi quartile Range (SIQR) or the Quartile Deviation (QD)


Q −Q1
SIQR or QD = 3
2

83.68−72
QD =
2
= 5.84
The Mean Deviation or Average Deviation
The average of the absolute deviations of the individual values of a set of numerical data from either
the mean, the median or mode. Among the 3, however, the mean is the most preferred and commonly used
measure of central tendency for computing the mean deviation or average deviation. To determine the mean
or average deviation, we shall use the following formulas

For Ungrouped Data:

MD or AD =
∑ |x−x́|
n
For Grouped Data:

MD or AD =
∑ f |x− x́|
n
Where:
x – refers to the individual value for ungrouped data, and the midpoint of each class interval for grouped data
x́ – the mean of the data

n – total number of frequencies


f – the frequency of each class interval
For Ungrouped data, the mean deviation or average deviation is determined by following the procedures
enumerated below.
1. Arrange the values from the lowest to the highest or vice versa
2. Compute the value of the mean
3. Find the individual absolute value of each deviation from the mean
4. Find the sum of the absolute values in step 3
5. Substitute the values in the formula and solve.
EXAMPLE: The number of TV units sold by an appliance store for a 10-day period are: 8,9,12,6,7,13,4,2,11 and
8. Determine the mean deviation or average deviation.
Solutions:
We shall first arrange the number of TV units sold according to magnitude (lowest to highest) and then find
the individual absolute value of each deviation from the mean, as shown below.

Age Absolute Values of Deviations from the mean


(x) |x− x́|
2 6
4 4
6 2
7 1
8 0
8 0
9 1
11 3
12 4
13 5
n = 10 ∑|x−x́| = 26

Mean (x́)

x́ =
∑ x = 80
n 10
Mean Deviation (MD) or Average Deviation (AD)

MD or AD =
∑ |x−x́|
n
26
= = MD or AD = 2.6
10

The MD or AD of 2.6 indicates the amount by which the individual sales are dispersed around their mean 0f 8.
The higher the value of MD or AD, the larger the dispersion.
For grouped Data, we shall compute the mean deviation (MD) or average deviation (AD) by following the
procedures below.
1. Compute the mean (x́ ¿ of the distribution

2. Subtract the mean (x́ ¿ from each of the midpoints and write the absolute values of the results under
the column ∑ | x−x́|

3. Find the products of items under column f and items under column |x− x́|

4. Add the products in step 3 to obtain the value of ∑ | x−x́|

5. Divide the sum obtained in step 4 by n.


EXAMPLE: Calculate the mean deviation (MD) or average deviation (AD) of the scores obtained by 89
applicants for employment, as shown in table 2 below.

Classes f x fx |x− x́| f|x− x́|


41-45 4 43 172 15.17 60.68
46-50 10 48 480 10.17 101.70
51-55 18 53 954 5.17 93.06
56-60 25 58 1450 0.17 4.25
61-65 17 63 1071 4.83 82.11
66-70 9 68 612 9.83 88.47
71-75 6 73 438 14.83 88.98
n = 89 ∑ fx=5,177 ∑ f |x−x́| = 519.25

The mean (x́ ¿ of the distribution is calculated by using the formula presented in chapter 3, as follows.

x́ =
∑x =
5,177
n 89

x́ = 58.17

Thus, the mean deviation (MD) or average deviation (AD) shall be computed as follows:

MD or AD =
∑ |x−x́| =
519.25
= 5.83
n 89
The Variance
Is defined as the average of the squared deviations from the mean. The square root of this variance is known
as the standard deviations from the mean. The square root of this variance is known as the standard deviation.
The variance for a sample data is denoted by S2 (reads S squared or the square of S), while the symbol for a
variance of the population is σ2, and is read sigma squared.
The Variance for ungrouped data
The formulas for calculating the variance for ungrouped data are:
For the variance of a sample data:

( x−x́ )2
S2 = ∑ where x́ is the mean of the sample data
n−1
For the variance of a population:

σ2 =
∑ ( x−μ́ )2 where μ represents the mean of the population.
N
We will use n -1 instead of n as divisor in computing the variance of a sample data although the variance is
defined as the average of the squared deviations about the mean. The reason for this is to avoid the likely
existence of biases that are normally associated with the use of the variance computed from different random
samples, especially when the samples sizes are small. The n different sizes selected from the same population
generally yield different values for the variance. But, the average of these values computed from several
samples of the population tends to be closer to the actual variance- the population variance.
To determine the variance of an ungrouped data, let us follow the steps below.
1. Arrange the values according to magnitude (Lowest to highest or vice versa) vertically
2. Calculate the mean
3. Obtain the individual deviations from the mean
4. Square each deviation and write the results under the column |x− x́|2
5. Find the sum of the squared deviations.
6. Divide the sum in step 5 by n – 1 for sample data or by N for population data.
Example: Determine the variance of the following 8 sample production units of a certain company: 10, 11, 9,
17,13,15,13 and 20.
Solutions: The 8 sample data, showing values under x - x́ and (x - x́)2, are as follows:

x x - x́ (x - x́)2
9 -4.5 20.25
10 -3.5 12.25
11 -2.5 6.25
13 -0.5 0.25
13 -0.5 0.25
15 1.5 2.25
17 3.5 12.25
20 6.5 42.25
n=8 ∑ (x−x́)2 = 96
Thus, the mean is obtained, as follows:

x́ =
∑x =
9+10+11+13+ 13+15+17+20 108
=
n 8 8

And the variance, as shown below.

2 ∑ ( x−x́ )2 96
S = = 8−1
n−1

S2 = 13.71
As you may observed, the computation of the variance (S2) for ungrouped data using the above procedures is
laborious and time-consuming. A simpler and easier solution can be done through the raw data formula.
The raw data formulas for computing the variance of ungrouped data are:
2
n x2 −( ∑ x )
S = ∑
2
for sample data
n( n−1)

And
2
2 N ∑ x 2−( ∑ x )
σ = for population data
N2
We will use n for the number of observations for sample data and N for population data.
To solve the variance of ungrouped data by raw data formula, we will follow the procedures enumerated
below.
1. Arrange the values in terms of their magnitude
2. Find the sum of the values
3. Square each value and write the results under the column x2
4. Get the sum of the squared values in step 3.
5. Substitute the results obtained in step 2 and step 4 in the raw data formula.
Example: Using the same 8 sample production units in the preceding example 1 with the compute mean ( x́) =
13.5, find the variance by the raw data formula.

x x́
9 81
10 100
11 121
13 169
13 169
15 225
17 289
20 400
∑ x =108 ∑ x 2 =1,554
n=8

Substituting the computed values obtained above in the raw data formula, we shall have:
2
n ∑ x2 −( ∑ x )
2
S =
n( n−1)

8 ( 1,554 )−( 108 )2


2
S =
8(8−1)

S2 = 13.71
The variance for Grouped Data
The variance for grouped data may be calculated by 2 methods. The first method involves rather longer
procedures using the mean deviation formula as follows.
Method 1 – (Long Method Formula)

2 ∑ f ( x−x́ )2
S = for sample data and
n−1

σ2 =
∑ f ( x− μ́ )2 for population data
N
We shall use x́ for the mean of a sample data and ´μ́ for the mean of a population data.

We will follow the procedures below to solve the variance of a grouped data by using the above formula (Long
method)

1. Find the value of the mean


2. Subtract the mean from each of the midpoints of each class interval
3. Square each of the deviations in step 2 and write the results under ( x−x́ )2
4. Multiply the squared deviations in step 3 by their corresponding frequencies.
5. Obtain the sum of the results in step 4.
6. Divide the results in step 5 by n -1 for sample data and by N for population data.

Example: A distribution of the ages of a sample of 87 managerial employees of a manufacturing company is


shown in table below. Find the variance by the use of the long method formula.

Age Number x fx x−x́ ( x−x́ )2 f ( x− x́ )2


classes (f)
31-35 5 33 165 -14.48 209.67 1048.35
36-40 10 38 380 -9.48 89.87 898.70
41-45 18 43 774 -4.48 20.07 361.26
46-50 25 48 1200 0.52 0.27 6.75
51-55 17 53 901 5.52 30.47 517.99
56-60 9 58 522 10.52 110.67 996.03
61-65 3 63 189 15.52 240.87 722.61
n = 87 ∑ fx=4,131 ∑ f ( x−x́ )2=4,5451.69

Solutions:
The mean (x́)

x́ =
∑x
n
4,131
x́ =
87
x́ = 47.48

Hence, we will obtain:

f ( x−x́ )2
S2 = ∑
n−1
4,551.69
S2 = 86

S2 = 52.93
Now, let us simplify the computation in solving the variance of grouped data by using a short ,ethod known as
the coding method formula, as follows.
Method 2: Short Method
2
n ∑ f d 2−( ∑ fd ) 2
2
S =
[n( n−1) ]
c for sample data

2
N ∑ fd 2−( ∑ fd )
σ= [ N 2 ] c 2 for population data

The d represents the coded value of class interval and c, the interval size of the class interval.
To obtain the variance of a grouped data by the short method or coding formula, we shall follow the
procedures listed below.
1. Write the coded values of the class intervals under the d column
2. Multiply the frequencies by the corresponding coded values.
3. Multiply the squared coded values by the corresponding frequencies
4. Add the results in step 2 and in step 3.
5. Substitute the values in the coding formula.
Example: Calculate the variance of the distribution of the ages of a sample of 87 managerial employees
contained in table 3 in the preceding example 1, by the short method (coded formula)

Age Number x fx x−x́ ( x−x́ )2 f ( x− x́ )2


classes (f)
31-35 5 33 165 -14.48 209.67 1048.35
36-40 10 38 380 -9.48 89.87 898.70
41-45 18 43 774 -4.48 20.07 361.26
46-50 25 48 1200 0.52 0.27 6.75
51-55 17 53 901 5.52 30.47 517.99
56-60 9 58 522 10.52 110.67 996.03
61-65 3 63 189 15.52 240.87 722.61
n = 87 ∑ fx=4,131 ∑ f ( x−x́ )2=4,5451.69

Substituting the obtained values in the formula, we shall find:


2
n ∑ f d 2−( ∑ fd ) 2
2
S =
[n( n−1)
c ]
87 ( 183 )−(−9)2
S2 = 52 [ 87(87−1) ]
S2 = 52.93

THE STANDARD DEVIATION


The standard deviation is the most important measure of variation. By knowing the standard deviation, we will
be able to determine the position of the scores in a frequency distribution in relation to the mean. A standard
deviation of a small value means that the values, in a distribution are scattered or spread out near the mean
and vice versa.
The formulas to compute the standard deviation are:
For ungrouped data

∑ ( x −x́ )2
S=
√ n−1
or √ ¿ ¿ note: c2 is included in the square root / radical sign

For grouped data


2
∑ f ( x−x́ )2 2
S=
√ N

For population data


or
√ N ∑ f d −( ∑ fd )
N2
c 2 note: c2 is included in the square root / radical sign

Or S = √ σ 2

Hence, the standard deviation of the ungrouped 8 sample data in Example 1 in which the variance (S 2) = 13.71
is calculated in the following manner

S = √ S2

S = √ 13.71
S = 3.70
Similarly, the standard deviation of the grouped data on the ages 87 managerial employees presented in
recent table where the variance (S2) = 52.93 is found as follows:

S = √ S2
S = √ 52.93
S = 7.28
The variance and the standard deviation are generally accepted measures of dispersion, especially in
discussions and presentation of reports containing basic statistics. The standard deviation is more popularly
used than the variance since its value is expressed in the unit of measurement as the observations and the
mean. Hence, it presents values that can be used directly for analysis and interpretation. For instance, when
the unit of measurement of the data is in kilos, the standard deviation and the mean are in kilos. The variance
on the other hand is kilos squared.

You might also like