You are on page 1of 75

Numerical 

Descriptive 
Measures
Definitions

The central tendency


locates the central value in a data set.

The variation measures how close to the centre


or how dispersed (scattered) the observations are
from the centre.

The shape is the pattern of the distribution of


values from the lowest value to the highest value.
Describing Data Numerically

Describing Data Numerically

Central Tendency Dispersion

Arithmetic Mean Range

Median Interquartile Range

Mode Variance

Standard Deviation

Coefficient of Variation
Measures of Central Tendency

Calculating the Mean, Median and 
Mode
Measures of Central Tendency

Purpose: 
To determine the 
“centre” of the 
data values.
The Mean

The mean is also known as the average.


Calculating the Sample Mean 
from raw data

Pronounced x-bar The ith observation


(values taken by x)
n

x i
x i 1

n
Sample size = number of observations
Example 1
The number of work days lost due to illness in a
business per week is given below
(for a 10 week period)

36, 28, 33, 29, 28, 32, 33, 33, 34, 32

Calculate mean number of days lost per week


during the above period.
n

x
i 1
i
Sample mean,
n
x1  x2  x3  ...  xn

n
36  28  33  ...  32

10

318

10

 31.8
Exercise 1

The following are the ages (in years) of all eight


employees of a small company

53, 32, 61, 27, 39, 44, 49, 57

Find the mean age of these employees.

45.25 years
Properties of the Sample Mean
 Uniqueness ‐‐ For a given set of data there is one 
and only one mean. 

 Affected (distorted) by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Properties of the Sample Mean
 May better be replaced by the median when 
the distribution of the data is ‘skewed’).

 An important property of the mean is that it 
includes every value in your data set as part of 
the calculation.
The Median

The median is the value of the middle observation


in a dataset.
Calculating the Median 
from raw data

Step 1: First, arrange the observations in ascending


order
Step 2: Then, find the middle position, using the
following formula if n is an odd number.
n 1
Median position 
2

Step 3: The median value is in the median position


Example 1
Find the median for the following data set.
27 38 12 34 42 40 24 40 23
 The ordered set becomes

Observation  12 23 24 27 34 38 40 40 42
Rank 1 2 3 4 5 6 7 8 9

9  1 th
 The median position is  5 rank (observation)
2
 Therefore the median = 34
Exercise 1
Sambiri Silicon manufactures computer monitors.
The following data are numbers of computer
monitors produced at the company for a sample of
10 days. Find the median.

24 31 27 25 35 33 26 40 25 28
Properties of the Median
 In an ordered array, the median is the “middle”
number (50% above, 50% below)
 Uniqueness -- There is only one median for each
set of data.
 Not affected by extreme values

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3
The Mode

The mode is the most frequently occurring value


in a dataset.
Calculating the Mode 
from raw data

Step 1: First, arrange the observations in ascending


order
Step 2: The mode is the most frequently occurring
value in the dataset.
Example 1

 Find the mode for the data below


7.00     11.00   14.25   15.00   15.00   15.50   
19.00   19.00   19.00   19.00   21.00   22.00   
23.00   24.00   25.00   27.00   27.00   28.00   
34.22   43.25

The mode is 19.00 because it recurs the most


times, i.e. four (4) times
Properties of the Mode
 Normally, the mode is used for categorical 
data where we wish to know which is the 
most common category
 Not affected by extreme values
 The mode is not unique

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Properties of the Mode
 There can be one mode
 There can be several modes
 We are now stuck as to which mode best 
describes the central tendency of the data.
 This is particularly problematic when we have 
continuous data because we are more likely not to 
have any one value that is more frequent than the 
other.
Properties of the Mode
 For example, consider measuring 30 peoples' 
weight (to the nearest 0.1 kg). How likely is it 
that we will find two or more people with 
exactly the same weight (e.g., 67.4 kg)? The 
answer, is probably very unlikely ‐ many 
people might be close, but with such a small 
sample (30 people) and a large range of 
possible weights, you are unlikely to find two 
people with exactly the same weight; that is, 
to the nearest 0.1 kg. This is why the mode is 
very rarely used with continuous data.
Question
When re‐ordering, the most common hat or 
jeans size is what you would like to know, not 
the average hat or jeans size.
The Shape: Skewness

The shape is the pattern of the distribution of


values from the lowest value to the highest value.
Symmetric Histogram
Skewed Histogram
Skewed Histogram
Measures of skewness
 Pearson’s coefficient

 Bowley’s coefficient (Galton’s coefficient)

Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 31
Measures of Central Tendency:
Summary

Central Tendency

Sample Mean Median Mode Geometric


Mean
n

X i
XG  ( X1  X2    Xn )1/ n

X i1
n Middle value Most Rate of
in the ordered frequently change of
array observed a variable
value over time
Measures of Dispersion
Measures of Dispersion

Which dataset has the larger variation?

Dataset 1

Dataset 2
Measures of Dispersion
Population 1 Population 2
Narrow range Wide range
Smaller Larger
variation variation
Smaller Larger
deviation deviation Population 1

Observations Observations
clustered spread out Population 2

Same centre,
different variation
Measures of Dispersion
The measures of central tendency, the mean, median
and mode, do not reveal the whole picture of the
distribution of the dataset.

Two datasets with the same mean may have


completely different spreads.

The amount or degree of spread is known as variation.


Measures of Dispersion

Variation

Range Variance Standard Coefficient


Deviation of Variation

Measures of variation give 
information on the spread or
variability or dispersion of 
the data values.

Same centre,
different variation
Measures of Dispersion:
The Range

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 – 1 = 12
Measures of Dispersion:
Why The Range Can Be Misleading

Range 12 - 7 5 Range 12 - 7 5
Measures of Dispersion:
Why The Range Can Be Misleading
 Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range 12 - 7 5 Range 12 - 7 5
Measures of Dispersion:
Why The Range Can Be Misleading

Range 5-1 4

Range 120 - 1 119


Measures of Dispersion:
Why The Range Can Be Misleading

 Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Range 5-1 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range 120 - 1 119


The Sample Variance
Variance is used to measure the dispersion of 
values relative to the mean.
n n

 (x i  x) 2
 xi 2
 nx 2

s 
2 i1
 i 1

n1 n1
Where
X = arithmetic mean
n = sample size
Xi = ith observation of the
variable X
The Sample Standard Deviation
 Most commonly used measure of variation
 Tells us how much observations in our sample
differ from the mean value within our sample.
 Has the same units as the original data making
it easier to interpret.

s s 2
Example
For this sample data  Xi: 

2, 3, 5, 1, 4, 3, 2, 4 find.

1. Sample variance
2. Sample standard deviation
The variation or dispersion in a set of values refers to
how spread out the values are from each other.

• The variation is small when the values are close together.

• There is no variation if the values are the same.

Smaller variation

Larger variation
The Coefficient of Variation
The variance and the standard deviation are useful
as measures of variation of the values of a single
variable for a single population (or sample).

If we want to compare the variation of two


variables we cannot use the variance or the
standard deviation because:

1. The variables might have different means.


2. The variables might have different units.
The Coefficient of Variation

 Measures relative variation to the mean
 Expressed as a percentage (%)

 s 
CV =   ×100%
x 
The Coefficient of Variation

The coefficient of variation compares the 
variability of two different datasets even if they 
have different units of measurement.
Example 1

Spot, the dog, weighs 65 pounds. Spot’s weight 
fluctuates 5 pounds depending on Spot’s 
exercise level.

Sea Biscuit, the horse, weighs 1200 pounds. 
Sea Biscuit’s weight fluctuates 125 pounds 
depending on the number of rides Sea 
Biscuit goes on. 
Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 52
Coefficient of Variation

Some financial investors use the 
coefficient of variation as a measure of 
risk.
What does the Coefficient of
Variation tell us about the risk of a
stock that the standard deviation
does not?
Relative to the amount invested in a
stock, the coefficient of variation reveals
the risk of a stock in terms of the size of
the standard deviation relative to the
size of the mean (in percentage).
Example 2

Relative to the amount of money invested in the 
stock, which stock, A or B, is riskier? 

Stock A Stock B
Average 
$50 $100
price

Standard 
$5  $5 
deviation
Comparing Coefficients of Variation

 s 5
CVA    100%  100%  10%
x  50

 s 5
CVB    100%  100%  5%
x  100

Comparing the C.V. it is clear that variation is much 
higher stock A than in stock B.
Example 3
The yearly salaries of all employees who work 
for a company have a mean of $62,350 and a 
standard deviation of $6820. 

The years of experience for the same 
employees have a mean of 15 years and a 
standard deviation of 2 years. 

Is the relative variation in the salaries larger or 
smaller than that in the years of experience for 
these employees?
Interpretation
 A low (%) value shows low variability 
implying tight clustering of observations 
about the mean.

 A middle to high (%) value shows high 
variability implying that observations are 
widely spread.
Measures of Position for 
ungrouped data
(Quartile Measures)
Quartile Measures

 Quartiles split the ranked data into 4 equal


segments.

25% 25% 25% 25%


Q1 Q2 Q3
 The first quartile(lower quartile), Q1, below the first
are 25% of the observations.
 Q2 is the same as the median (middle quartile)and
hence below the second quartile are 50% of the
observations.
 The third quartile(upper quartile), Q3, below the
third quartile are 75% of the observations.
Quartile Measures

 Q1 = 25th percentile = P25

 Q2 = 50th percentile = P50

 Q3 = 75th percentile = P75


Locating Quartiles Positions
Step 1: First, arrange the observations in 
ascending order
Step 2: Find the quartile positions using the 
following formulas.

Q1 position  0.25 n  1
 Q 2 position  0.5  n  1
Q3 position  0.75 n  1
Step 3: Determine the quartile values.
The Interquartile Range (IQR)

Remember that the range can be distorted by 
outliers.

The IQR excludes these outliers and focuses on the 
spread of the middle 50% of the data values.

The IQR is also called the 50% mid‐spread range.

IQR  Q3  Q1
The Interquartile Range (IQR)

Weakness

The IQR, like the range, also provides no 
information on the clustering of observations 
within the dataset as it uses only two 
observations in its computation.
Example 1

Given Sample Data in Ordered Array:


11 12 13 16 16 17 18 21 22

Find   
1. Q1  and  Q3
2. IQR
Locating First quartile, Q1

11 12 13 16 16 17 18 21 22

(n = 9)
Q1 is in the 0.25(9+1)=2.5 th position of the ranked
data
so use the value half way between the 2nd and 3rd values
12  13  13  12 
Q1   12.5 or Q 1  12     12.5
2  2 
Locating Third Quartile, Q3

11 12 13 16 16 17 18 21 22

(n = 9)
Q3 is in the 0.75(9+1)=7.5 th position of the ranked
data
so use the value half way between the 7th and 8th values.
18  21  21  18 
Q3   19.5 or Q 3  18     19.5
2  2 
The Interquartile Range (IQR)

IQR  Q3  Q1
 19.5  12.5
 7.0
Example 2

Given Sample Data in Ordered Array:


7 8 9 10 11 12 13 13 14 17 17 45

Find   
1. Q1  and  Q3
2. IQR
Locating First quartile, Q1

7 8 9 10 11 12 13 13 14 17 17 45

(n   12)   Q1  is in the   0.2512  1  3.25 pos of the ranked data. 


So find the value half way between the 3rd and 4th values,   
9  10
which is   9.5
2

9  9.5  10  9 
Q1   9.25 or Q 1  9     9.25
2  4 
Locating Third Quartile, Q3

7 8 9 10 11 12 13 13 14 17 17 45

(n   12)   Q3  is in the   0.7512  1  9.75 pos of the ranked data. 


So find the value half way between the 9th and 10th values,   
14  17
which is   15.5
2

15.5  17  17  14 
Q3   16.25 or Q 3  17     16.25
2  4 
The Interquartile Range (IQR)

IQR  Q3  Q1
 16.25  9.25
 7.0
End of Chapter 
Grouped data
 Mean

 Variance

 CV

Ch
ap
Basic Business Statistics, 11e © 2009 3-
Prentice-Hall, Inc.. 75

You might also like