You are on page 1of 19

ECC3014 Engineering statistics

Measure of location & variability


Dr. Alifdalino Sulaiman
Dept. Food and Process Engineering
Measures of Center
• A measure along the horizontal axis of the data distribution
that locates the center of the distribution.
• What do you use as a measure of centre?
(a) Mean? (b) Median? (c) Mode?

• Not all three are suitable to describe a distribution for ALL cases
Basic numerical representation
Draw the frequency histogram. Calculate the mean, median
and mode for the number of quarts of milk purchased by the
following 25 households:
0 0 1 1 1
1 1 2 2 2
2 2 2 2 2
2 3 3 3 3
3 4 4 4 5

• Mean? Median? Mode?

x=
 xi 55
= = 2.2
m=2 mode = 2
n 25
Extreme Value
• The mean is more easily affected by extremely large or small values
than the median.
• The median is often used as a measure of center when the
distribution is skewed.
Extreme Values (cont.)
Mean is often used as a measure of center when the distribution is symmetric.

Symmetric: Mean = Median

Skewed right: Mean > Median


•Skewed Right – long tail to the right

Skewed left: Mean < Median


•Skewed Left – long tail to the left

Median is often used as a measure of center when the distribution is skewed.


Skewed Right (Positively Skewed)
• Skewed Right – long tail to the right
• A few high numbers pull the mean above the median
• The set: The graph:
5
Num. Frequency 4.5
4
3.5
3 1

1 3 2.5
2
2
3
1.5 4

2 5
1
0.5
0
Frequency

3 3
4 1

Mean = [1(3) + 2(5) + 3(3) + 4(1)] / 12 = 2.17


Median = 2
Mean > Median
Skewed Left (Negatively Skewed)
• Skewed Left – long tail to the left
• A few low numbers pull the mean below the median
The set: The graph:
5
Num. Frequency 4.5
4
3.5
3 1

1 1 2.5
2
2
3
1.5 4

2 3
1
0.5
0
Frequency

3 5
4 3

Mean = [1(1) + 2(3) + 3(5) + 4(3)] / 12 = 2.83


Median = 3
Mean < Median
Measures of Variability
• A measure along the horizontal axis of the data
distribution that describes the spread of the distribution
from the center.
The Range

▪ The range, R describes the difference between the largest and


smallest measurements.
▪ Example: A botanist records the number of petals on 5 flowers:
▪ 5, 12, 6, 8, 14
▪ The range is

R = 14 – 5 = 9
The Variance
• The variance is measure of variability that uses all the
measurements (as oppose to range R that uses only 2
measurements, maximum and minimum).
• It measures the average deviation of the measurements from
their mean.
• Flower petals: 5, 12, 6, 8, 14

45
x= =9
5 4 6 8 10 12 14
The Variance (cont.)
• The variance of a population of N measurements is the average of
the squared deviations of the measurements about their mean,
μ.
 ( x −  ) 2
2 = i
N
• The variance of a sample of n measurements is the sum of the
squared deviations of the measurements about their mean,
divided by (n – 1).

( xi − x ) 2
s =
2

n −1
2 Ways to Calculate the Sample Variance
Use the Definition Formula:

xi xi − x ( xi − x ) 2 ( xi − x ) 2

5 -4 16
s =
2

n −1
12 3 9
6 -3 9 60
8 -1 1
= = 15
4
14 5 25
Sum 45 0 60 s = s 2 = 15 = 3.87
2 Ways to Calculate the Sample Variance (cont.)
Use the Calculation Formula:

(  x ) 2
 xi −
2 i
xi xi2 n
s =
2
5 25
n −1
12 144
452
6 36 465 −
8 64 = 5 = 15
14 196 4
Sum 45 465 s = s = 15 = 3.87
2
The Standard Deviation

• In calculating the variance, we squared all of the deviations, and


in doing so changed the scale of the measurements.
• To return this measure of variability to the original units of
measure, we calculate the standard deviation, the positive
square root of the variance.

Population standard deviation :  =  2


Sample standard deviation : s = s 2
Some Notes
• The value of s is ALWAYS positive.
• The larger the value of s2 or s, the larger the variability of
the data set.
• Why divide by n –1?
• The sample standard deviation s is often used to estimate
the population standard deviation s. Dividing by n –1 gives
us a better estimate of s.
Exercise 1
1. Question: Find the mean, median and mode of:
5, 7, 3, 5, 6, 8, 5, 6, 4, 6, 25
Solution: Note: First, arrange the data
3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 25
median = 6; mean = 80/11 = 7.27 ; modes = 5 and 6

2. Question: Eliminate the last observation x= 25 and then find the mean,
median and mode. How do these values compare with those found
using the full data set?
Solution: median = 5.5; mean = 55/10 = 5.5; modes = 5 and 6. The mean
is smaller.

3. Question: How do possible outliers (such as 25) affect these values?


Solution: The mean is very much affected by the outlier, while the
median and mode are not so.
Exercise 2
• Given the observations
7, 9, 10, 6, 8, 7, 8, 9, 8
calculate:
1. the range
Solution : R = 10 – 6 = 4

2. the mean
Solution : Mean = 72 / 9 = 8

3. the variance
Solution : Variance = [588 – (722/9)] / 8 = 12 / 8 = 1.5

4. the standard deviation


Solution : Standard Deviation = √1.5 = 1.225

You might also like