Professional Documents
Culture Documents
Chapter 3_Part-1
NUMERICAL DESCRIPTIVE MEASURES
MEASURES OF CENTRAL TENDENCY FOR
UNGROUPED DATA
Mean
Median
Mode
Relationships among the Mean, Median, and Mode
Figure 3.1
Mean
The mean for ungrouped data is obtained by dividing the
sum of all values by the number of values in the data set. Thus,
x x 1 x2 x3 x4 x5 x6 x7 x8
319 199 110 63 21 315 26 63 1116
x
x 1116
139.5 $139.5million
n 8
The following are the ages (in years) of all eight employees of a
small company:
53 32 61 27 39 44 49 57
Find the mean age of these employees.
Example: Solution
x 362
45.25 years
N 8
Thus, the mean age of all eight employees of this company is
45.25 years, or 45 years and 3 months.
Example
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
There are 12 values in this data set. Because there are an even
number of values in the data set, the median is given by the
average of the two middle values.
Example: Solution
The two middle values are the sixth and seventh in the arranged
data, and these two values are 28.0 and 28.2.
The median gives the center of a histogram, with half the data
values to the left of the median and half to the right of the
median.
The following data give the speeds (in miles per hour) of eight
cars that were stopped for speeding violations.
77 82 74 81 79 84 74 78
A major shortcoming of the mode is that a data set may have none or
may have more than one mode, whereas it will have only one mean
and only one median.
1. Unimodal: A data set with only one mode.
2. Bimodal: A data set with two modes.
3. Multimodal: A data set with more than two modes.
Example:
Because each value in this data set occurs only once, this data set
contains no mode.
Example
A small company has 12 employees. Their commuting times
from home to work are
23, 36, 12, 23, 47, 32, 8, 12, 26, 31, 18, and 28, respectively.
This data set has three modes: 19, 21 and 22. Each of these
three values occurs with a (highest) frequency of 2.
Mode
In one survey, it was asked to 10 respondents “from which city are they
from?”
The responses were,
Dhaka, Dhaka, Sylhet, Chittagong, Chittagong, Chittagong, Khulna,
Barisal, Rajshahi, Rajshahi
Find the mode?
Mode: Chittagong
We cannot calculate the mean and median for this data set.
Relationships Among the Mean, Median, and
Mode (For grouped data divided in classes)
Example:
1, f=5
2, f=5
3, f=5
4, f=5
5, f=5
Figure: Mean, median, and mode for a histogram and
frequency distribution curve skewed to the right.
Example:
1, f=10
2, f=7
3, f=5
4, f=3
5, f=1
Figure: Mean, median, and mode for a histogram
and frequency distribution curve skewed to the left.
Example:
1, f=1
2, f=3
3, f=5
4, f=7
5, f=10
MEASURES OF DISPERSION
FOR UNGROUPED DATA
Range
Variance and Standard Deviation
Population Parameters and Sample Statistics
Range
Finding the Range for Ungrouped Data
Following table gives the total areas in square miles of the four
western South-Central states of the United States.
Thus, the total areas of these four states are spread over a range of
217,626 square miles.
Range - Disadvantages
2. Its calculation is based on two values only: the largest and the
smallest. All other values in a data set are ignored when
calculating the range.
Thus, the range is not a very satisfactory measure of dispersion.
Variance and Standard Deviation
The variance calculated for population data is denoted by σ²
(read as sigma squared), and the variance calculated for
sample data is denoted by s².
The standard deviation calculated for population data is
denoted by σ, and the standard deviation calculated for
sample data is denoted by s.
Variance and Standard Deviation
2
x 2
and s 2
x x
2
N n 1
x x x
2 2
and s
N n 1
x 2
N
x 2
n
2 and s 2
N n 1
x
2
x
2
x 2
N
x 2
n
and s
N n 1
Where σ² is the population variance, s² is the sample variance, σ is the population standard
deviation, and s is the sample standard deviation.
Example
Until about 2009, airline passengers were not charged for checked
baggage. Around 2009, however, many U.S. airlines started charging a
fee for bags. According to the Bureau of Transportation Statistics, U.S.
airlines collected more than $3 billion in baggage fee revenue in 2010.
The following table lists the baggage fee revenues of six U.S. airlines
for the year 2010.
Find the variance and standard deviation for these data.
Example: Solution
x
2
2,854
2
x 2
n
1,746,098
6
s2
n 1 6 1
1,746,098 1,357,552.667
5
77,709.06666
Example 3-12: Solution
x
2
x 2
n
s 77,709.06666
n 1
278.7634601 $278.76million
Thus, the standard deviation of the 2010 baggage fee revenues of
these six airlines is $278.76 million.
One Observation!
The values of the variance and the standard deviation are
never negative.
Example
x
2
2
(449.30)
x2
N
35,978.51
6
2 388.90
N 6
388.90 $19.721 thousand $19,721
mf
535
21.40 minutes
N 25
Thus, the employees of this company spend an average of
21.40 minutes a day commuting from home to work.
Example
The table below gives the frequency distribution of the number of
orders received each day during the past 50 days at the office
of a mail-order company.
x
mf
832
16.64 orders
n 50
Thus, this mail-order company received an average of
16.64 orders per day during these 50 days.
Variance and Standard Deviation for
Grouped Data
Basic Formulas for the Variance and Standard Deviation for
Grouped Data
f m f m x
2 2
2
and s 2
N n 1
( mf ) 2
mf
2
m f 2
N
m 2
f
n
2 and s 2
N n 1
where σ² is the population variance, s² is the sample variance,
and m is the midpoint of a class.
( mf ) 2
mf
2
m f 2
N
m 2
f
n
2 and s 2
N n 1
where σ² is the population variance, s² is the sample variance,
and m is the midpoint of a class.
( mf ) 2 (535) 2
m 2
f
N
14,825
25 3376 135.04
2
N 25 25
m2 f
( mf ) 2
14,216
(832 ) 2
s2 n 50 7.5820
n 1 50 1
Chebyshev’s Theorem
Empirical Rule
Chebyshev’s Theorem
(1 – 1/k²)
1 1 1
1 2
1 2
1 1 .25 .75 or 75%
k ( 2) 4
The 1st quartile is the value of the middle term among the
observations that are less than the median, and
The 3rd quartile is the value of the middle term among the
observations that are greater than the median.
Figure: Quartiles
Quartiles and Interquartile Range
Interquartile Range:
The difference between the third and the first quartiles gives the
interquartile range; that is,
(a) Find the values of the three quartiles. Where does the total
compensation of Michael D. White (CEO of DirecTV) fall in relation
to these quartiles?
(a)
(a) Find the values of the three quartiles. Where does the age of 28
years fall in relation to the ages of the employees?
(a)
Calculating Percentiles
The (approximate) value of the k th percentile, denoted by Pk is
kn
Pk Value of the th term in a ranked data set
100
where k denotes the number of the percentile and n represents
the sample size.
Example
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
kn (60)(12)
7.20th term 7th term
100 100
Example 3-22: Solution
Percentile rank of xi
Number of values less than xi
100
Total number of values in the data set
Example
21.6 21.7 22.9 25.2 26.5 28.0 28.2 32.6 32.9 70.1 76.1 84.5
In this data set, 4 of the 12 values are less than $26.5 million.
Hence,
Example: Solution
The following data are the incomes (in thousands of dollars) for
a sample of 12 households.
Step 1. First, rank the data in increasing order and calculate the
values of the median, the first quartile, the third quartile, and the
interquartile range.
Find the points that are 1.5 x IQR below Q1 and 1.5 x IQR above
Q3.