You are on page 1of 43

EDUCATIONAL DATA REASONING

BBD 30402

By:

Faculty of Technical & Vocational Education,


University Tun Hussein Onn Malaysia, Malaysia
Measures of Variation

Measures of Variation (“Spread”)

• COURSE LEARNING OUTCOMES


• Another important characteristic of quantitative data is how
much the data varies, or is spread out.

• The 3 most common method of measuring spread are:


1. Range
2. Inter-quartile range
3. Standard deviation and Variance
Measures of Variation

Range

Range

• The difference between the maximum and minimum data


entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)
Measures of Variation

Example: Finding the Range

COURSE
• The wait time to LEARNING OUTCOMES
see a bank teller is studied at 2 banks.

Bank A has multiple lines, one for each teller.


Bank B has a single wait line for 1st available teller.

5 wait times (in minutes) are sampled from each bank:


Bank A: 5.2 6.2 7.5 8.4 9.2
Bank B: 6.6 6.8 7.5 7.7 7.9

Find the mean, median, and range for each bank.


Measures of Variation

Solution: Finding the Range

COURSE
• • Bank A: Range = ?LEARNING OUTCOMES
• Bank B: Range = ?

• Note: The range is easy to compute, but only uses 2 values.


Do the following 2 sets vary the same?

– Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
– Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

==Example==

i x[i]
1 102
2 104
3 105 ---- the first quartile, Q1 = 105
4 106
Inter 5 108
Quartile 6 109 ---- the second quartile, Q2 or median = 109
Range 7 110
8 112
9 115 ---- the third quartile, Q3 = 115
10 115
11 118

From this table, the '''interquartile range''' is (Q3) 115 – (Q1) 105 = 10.
Measures of Variation

Inter-quartile Range
• COURSE LEARNING OUTCOMES
• The inter-quartile range is the range for the middle
50% of observations. That is the distance from the
third quartile (75th percentile) to first quartile (25th
percentile) on a frequency distribution.
• Because the inter-quartile range is the distance
between the 25th and 75th percentiles it is not
sensitive to changes in the extreme scores at either
end of the distribution.
Measures of Variation

Example
• COURSE LEARNING OUTCOMES
4 7 6 31 10 29 4 6 9 11 7 23
5 8 10 7 11 6 5 8 10 9 12 9 8

• Find the 1) Range


2) Inter-quartile range (IQR)
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

Solution (Option B)
• •Find
COURSE LEARNING OUTCOMES
the 1) Range
2) Inter-quartile range

Range = 31 – 4 = 27
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

(4 4 5 5 6 6 6 7 7 7 8 8) 8 (9 9 9 10 10 10 11 11 12 23 29 31)
Q1 = (6+6)/2 Q3 = (10+11)/2
=6 Q2 = 8 = 10.5
Median = n + 1 = 25 + 1 = 13 IQR = Q3 – Q1 =10.5 - 6
2 2 = 4.5
Interquartile range - steps
Steps:
Step 1: Put the numbers in order.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
• COURSE LEARNING OUTCOMES
Step 2: Find the median.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.

Step 3: Place parentheses around the numbers above and below the median.
Not necessary statistically, but it makes Q1 and Q3 easier to spot.
(1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27).

Step 4: Find Q1 and Q3


Think of Q1 as a median in the lower half of the data and think of Q3 as a median for
the upper half of data.
(1, 2, 5, 6, 7), 9, ( 12, 15, 18, 19, 27). Q1 = 5 and Q3 = 18.

Step 5: Subtract Q1 from Q3 to find the interquartile range.


18 – 5 = 13.
Measures of Variation

The inter-quartile range (IQR) in particular is used to


describe the dispersion of the data.
• COURSE LEARNING OUTCOMES

The inter-quartile range (IQR) is defined as the range


between the first and the third quartile. Please note that
the IQR contains exactly 50 %of the data within the
distribution.
Measures of Variation

Median, Quartiles, Deciles & Percentiles

COURSE
• •The LEARNING
Median is a value
halves.
OUTCOMES
that subdivides the ordered data into two

• The Quartiles subdivide the data into quarters, the deciles provide a
subdivision into tenths, and the percentiles a subdivision into
hundredths.
• There are three quartiles: the lower quartiles, Q1, the median(Q2),
and the upper quartile, Q3.
• The percentiles are simply called the 1st percentile, the 2nd percentile
and so on.
• The median is the 5th decile and the 50th percentile.
• A study of the values of the deciles or quartiles gives us an idea of
the spread of the data, but an ‘ idea’ is all we get and there is no
need for great precision
Measures of Variation

IN SUMMARY:

• COURSE
• MEDIAN
LEARNING OUTCOMES
– (Data is divided into 2 parts)
• QUARTILE
– (Data is divided into 4 parts)
• DECILES
– (Data is divided into 10 parts)
• PERCENTILES
– (Data is divided into 100 parts)
Measures of Variation

Mean (or Average) Deviation

If a deviation (x) is the difference of a score


from its mean and variability is the extent to
which the scores differ from their mean, then
summing all the deviations and dividing by
the number of them should give us a
measure of variability. The problem though is
that the deviations sum to zero.
Measures of Variation
Computation of the deviation Scores and Average Mean
Deviation

• COURSE
Value, x
LEARNING
(x - x̅)
OUTCOMES
Σ(x - x̅)

3 3-9 6
6 6-9 3
6 6-9 3
7 7-9 2
8 8–9 1
11 11 – 9 2
15 15 – 9 6
16 18 - 9 9
Σx = 78 Σ(x - x̅) =32
x̅ = 78/8 = 9

Average mean deviation = Σ(x - x̅)/n = 32/8 = 4


So, the mean = 9 and the mean deviation = 4
Measures of Variation

Mean (or Average) Deviation

• URSE
• LEARNING
However, OUTCOMES
computing the absolute value of the deviations
before summing them eliminates this problem. Thus, the
formula for the MD is given by:

• The problem with the MD is that due to the use of the


absolute value, it is a terminal procedure. In other words, it
cannot be use in further calculations (which is something
that we would like to be able to do).
Measures of Variation
Exercise:
Goals scored in the Past 10 Games by the C College and B
University Water Polo Teams
________________________
Games C College B University
1 6 0
2 5 11
3 6 13
4 7 0
5 6 12
6 6 0
Find: 7 6 0
1. Mean 8 8 14
2. Range 9 5 10
3. Inter-Quartile Range (IQR) 10 5 0
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

Wait time, x (min) Deviation, x - x̅ Square, (x – x̅)2


• COURSE
5.2 5.2LEARNING
– 7.3 = -2.1 (-2.1) 10.53
= 4.41 2

6.2 6.2 – 7.3 = -1.1 (-1.1) = 1.21 2

7.5
• (5-1)
7.5 – 7.3 = 0.2 (0.2) = 0.04 2

• OUTCOMES
8.4 8.4 – 7.3 = 1.1 (1.1) = 1.21 2

9.2 9.2 – 7.3 = 1.9 (1.9)2 = 3.61


Ʃ x = 36.5 Ʃ (x – x̅) Ʃ (x - x̅)2 =10.48

10.48
= √ (5-1)

=1.62
Quick Preview of the steps to follow:
Step 1 : Find the mean.
Step 2 : For each data point, find the square of
its distance to mean.
Step 3: Sum the values from Step 2.
Step 4 : Divide by the number of the data points.
Step 5 : Take the square root.
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

Wait time, x (min) Deviation, x - x̅ Square, (x – x̅)2


• COURSE
6.6 6.6LEARNING
– 7.3 = -0.7 (-0.7) 10.53
= 0.49 2

6.8 6.8 – 7.3 = -0.5 (-0.5) = 0.25 2

7.5
• (5-1)
7.5 – 7.3 = 0.3 (0.3) = 0.09 2

• OUTCOMES
7.7 7.7 – 7.3 = 0.4 (0.4) = 0.16 2

7.9 7.9 – 7.3 = 0.6 (0.6)2 = 0.36


Ʃ x = 36.5 Ʃ (x – x̅) Ʃ (x - x̅)2 =1.35

1.35
= √ (5-1)

=0.58
Exercise:
Calculate the Standard Deviation for this data:

6, 2, 3, 1
Exercise:
Calculate the Standard Deviation for this data:
6, 2, 3, 1
1) Ʃ x = 3
2) Ʃ (x – x̅) = (6):9, (2):1, (3):0, 1(4)
3) Ʃ (x - x̅)2 = 14
4) = 14/4 = 3.5
Exercise:
Calculate the Standard Deviation for this data:

1, 4, 7, 2, 6

Answer:
= 2.28
Measures of Variation
Sample versus Population
Standard Deviation and Variance

Sample Population
Statistics: Parameters:

Mean x µ

Standard s σ
Deviation

Variance s2 σ2
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation
Standard Deviation: Key Points

 s  0 (When would s = 0 ?)

 The standard deviation is a measure of variation of all values


from the mean. The larger s is, the more the data varies.

 The units of the standard deviation s are the same as the units
of the original data values. (The variance has units2).

 The value of the standard deviation s can increase dramatically


with the inclusion of one or more outliers (data values far
away from all others)
Measures of Variation

• COURSE LEARNING OUTCOMES


Measures of Variation

The Empirical Rule

• COURSE
Empirical LEARNING
(68-95-99.7) Rule OUTCOMES
For data sets having a symmetric distribution:

 About 68% of all values fall within 1 standard deviation of


the mean

 About 95% of all values fall within 2 standard deviations


of the mean

 About 99.7% of all values fall within 3 standard deviations


of the mean
Measures of Variation

The Empirical Rule


• COURSE LEARNING OUTCOMES
Measures of Variation

The Empirical Rule


• COURSE LEARNING OUTCOMES
Measures of Variation
The Empirical Rule

• COURSE LEARNING OUTCOMES


Measures of Variation

Example: Using the Empirical Rule

A sample of IQs has a symmetric distribution with a mean of 100


and a standard deviation of 15.

1. Sketch the distribution.


2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?
Measures of Variation

When to Use a Particular Statistic


COURSE LEARNING OUTCOMES
CENTRAL
• EXAMPLE QUESTION
MEASUREMENT
VARIABILITY (how
TENDENCY
similar the
LEVEL (most typical
responses are)
response)
What is your gender? Nominal scale Mode Frequency and/or
percentage
Rank these 5 brands from your Ordinal scale Median Cumulative
1st choice to your 5th choice? percentage
On a scale off 1 to 5, how does Interval scale Mean Standard deviation
“Starbucks” rate on variety of and/or range
its coffee drinks?
About how many times did you Ratio scale Mean Standard deviation
charge your cell phone last and/or range
week?

39
Measures of Variation

Summary

• The inter-quartile range is used in the conjunction with the


median to describe skewed distribution. It calculated as one -
half of the distance between the scores at the 25th and 75th
percentiles

• The variance is used with the conjunction of the mean to


describe symmetrical or normal distributions of interval or
ratio scores. It is the average of the squared deviations of
scores around the mean.
Measures of Variation

Summary
• COURSE LEARNING OUTCOMES
• The standard deviation is also used in conjunction
with the mean to describe symmetrical or normal
distribution of interval/ratio scores. It is the square
root of the variance. It can be thought of as the
“average” amount that scores deviate from the
mean.
Measures of Variation

Summary
• COURSE LEARNING OUTCOMES
• Measures of variability describe how much
the score differ from each other, or how much
the distribution is spread out.

• The range is a measure of variability based on


the difference between the highest score and
the lowest score.

You might also like