Data Description

CHAPTER 3
NUMERICAL DESCRIPTIVE MEASURES

3.1
MEASURES OF CENTRAL TENDENCY
Usually called the average

A single value that summarize a set of data. It locates the centre of the values
Three types of averages that are often used as measure of central tendency:
(i) Mean
(ii) Median
(iii) Mode
3.1.1
Mean ( x )
- the average representation of data
- calculate by dividing the sum of all data values in a distribution by the number of
data
- a population mean is an example of parameter
- a population mean is denoted of
- the sample mean is an example of a statistic
- the mean of a sample is denoted by
Population mean:
=
Ungrouped data
Sample mean:
X=
1 + 2 + 3 + +
=
+ 2 + 3 + +
Grouped data
Each
class
interval
is
represented by the mid-point
of the interval, xi .
X=
fx
i
where n = total frequency

Example 1: Dixon Corporation manufactures computer monitors. The following data are
the numbers of computer monitors produced at the company for a sample of 10 days.
24
32
27
23
35
33
29
40
23
28
Calculate the mean. Comment on the value obtained.
Example 2: The following data shows prices of vehicles sold last month at KAR Company.
Selling Price (RM
thousand)
No. of
vehicles, f i
12 and less than 15

15 and less than 18
18 and less than 21
21 and less than 24
24 and less than 27
27 and less than 30
30 and less than 33
8
23
17
18
8
4
2
Midpoint, xi
f i xi
Calculate the mean price of the vehicles sold last month and explain its meaning.
Example 3: Calculate the mean number of vehicles owned by residents and explain the
meaning of mean value obtained.
Vehicles owned
0
1
2
3
4
5
No. of
households
5
25
10
4
2
1
fx
3.1.2
x)
Median ( ~
- the middle value of an ordered array of data
- 50% of the data are greater or equal to this value and another 50% of the data are
smaller or equal to this value
- Advantages:
Unaffected by the magnitude of extreme values
The best measure of location
Ungrouped data
1) Arrange the data in ascending order
2) If the number of observations is odd, the median is the number in the middle
of the ordered list. If the number of observations is even, the median is the
average of the 2 middle values.
3) After arrange data, find the position of median.
Position of median =
n 1
2
Grouped data
1) To find median, create a column for the CF
2) Determine the position of median
Position of median = 2
3) The median is calculate as follows
f m1
x = Lm + 2
Median, ~
C
fm
Lm
fm
: lower median class boundary
: frequency for the median class

C
: class size
f m1 : cumulative frequencies for the classes before the median class
Example 1: Find the median for sets of data.

(a) 34, 23, 56, 67, 34, 23, 12, 78, 67, 89, 56, 78
(b) 1.23, 4.56, 3.45, 2.21, 3.33, 2.34, 4.56
Example 2: The following data shows prices of vehicles sold last month at KAR company.
Selling Price (RM
thousand)
12 and less than 15
15 and less than 18
18 and less than 21
21 and less than 24
24 and less than 27
27 and less than 30
30 and less than 33
No. of
vehicles, f
8
23
17
18
8
4
2
Class boundary
Cumulative
frequency
(i) Calculate the median and explain the meaning of the value obtained.
(ii) Draw an ogive and find the median from the ogive.
3.1.3
Mode ( x )
- Disadvantage: not unique (a set of data may have more than one mode)
Ungrouped data
-
The value that is repeated most often in a data set

Grouped data
- To find mode, select class with the highest frequency

- mode can be found by using histogram
f 0 f1
C
f 0 f1 f 0 f 2
Mode, x = L +
L : lower boundary of the class containing mode

f 0 frequency of the class containing mode
f1 frequency of the class before the class containing mode
f 2 frequency of the class after the class containing mode
C : size of the class
Example 1: Find the mode.

(i) 3, 4, 6, 7, 4, 5, 4, 9 =
(ii) 1, 1, 2, 2, 3, 3, 4, 4 =
(iii) 2, 5, 7, 8, 7, 8, 9, 11 =
Example 2: Find the modal length.
Length (cm)
24
57
8 10
11 13
14 16
17 19
Mean
Involve all data values
Influenced by extreme
values
Only one mean for a
data set
Applicable to
quantitative data only
Frequency
2
7
9
14
6
2
Class boundary
Median
Does not involve all data
values
Not influenced by
extreme values
Only one median for a
data set
Applicable to quantitative
data only
Mode
Does not involve all data
values
Might influenced by
extreme values
Can be more than one
mode for a data set
Applicable to
quantitative and
qualitative data only
3.1.4
Relationship among mean, median and mode
(i) If mean = median = mode

the distribution is symmetrical or zero- skewness
(ii) If mean > median > mode

the distribution is positively skewed (or skewed to the right)
(iii) If mean < median < mode

the distribution is negatively skewed (or skewed to the left)
EXERCISE
The data below shows the amount of electricity bill charged to houses at Taman Rimbun.
Amount (RM)
50 69
70 89
90 109
110 139
140 149
150 179
Frequency
4
16
20
25
10
5
Calculate the mean, median and mode for the amount of bill. Explain the meaning of each of the
value obtained.
3.2
MEASURES OF DISPERSION
A
A measure of dispersion measures the spread or variability of the data distribution.

2 distributions (A and B) may have the same central locations but different dispersions
or spreads.
If the data are widely dispersed, the central location is said to be less representative of
the data as a whole.
The central location for the data with little dispersion is considered more reliable.
A widely spread distribution should not be used for decision making.
TYPES OF MEASURE OF DISPERSION
Variance
Coefficient of
variation
Standard
deviation
The more spread out or dispersed the data, the larger is the measures of dispersion.
If the data is more concentrated or homogeneous, the measures of dispersion are
smaller.
If the observations are the same, the measures of dispersion will be zero.
Thus, all these measures of dispersions are always give positive values.
3.2.1
Variance and standard deviation

Variance measure the fluctuations of data values above and below its mean.
Its computation results in squared units.
Standard deviation A primary measure of variation. The value of this
measurement is in the original unit of data.
Variance and standard deviation for ungrouped data

Population Variance
Sample Variance
( X ) 2
1
2
X
N
N
( x ) 2
1
2
s
x
n 1
n
Population Standard Deviation
Sample Standard Deviation
s s2
x . The value of x is obtained by squaring

the x values and the adding them. The value of x is obtained by squaring the value of x .
Warning!!! Note that
is not the same as
EXAMPLE
1) Following are the 2005 earnings (in thousands of dollars) before taxes for all six
employees of a small company.
48.50 38.40 65.50 22.60 79.80 54.60
X2
48.50
38.40
65.50
22.60
79.80
54.60
Since the data earnings of all employees of this company, then use the population formula
to compute the variance and standard deviation.
8
2) The following table, based on Forbes magazines list of the wealthiest people in the
world, gives the total wealth (in billions of dollars) of five persons (USA TODAY, March
11, 2005).
Total Wealth
(billions of dollars)
46.5
18.0
16.0
7.8
7.2
Billinaire
Bill Gates
Helen Walton
Michael Dell
Keith Rupert Murdoch
George Soros
Find the variance and standard deviation for these data.
x2
46.5
18.0
16.0
7.8
7.2
Variance and standard deviation for grouped data

Population Variance
Sample Variance
fX
1
fX 2
N
N
fx
1
2
fx
s
n 1
n
Population Standard Deviation
Sample Standard Deviation
s s2
EXAMPLE
1) The following data is the frequency distribution of the daily commuting times (in minutes) from
home to work for all 25 employees of a company.
Daily Commuting Time
(minutes)
0 to less than 10
10 to less than 20
20 to less than 30
30 to less than 40
40 to less than 50
Number of Employees
4
9
6
4
2
Calculate the variance and standard deviation.

Daily Commuting
Time (minutes)
0 to less than 10
10 to less than 20
20 to less than 30
30 to less than 40
40 to less than 50
Number of
Employees, f
4
9
6
4
2
fX
fX 2
(midpoint)
fX
10
fX
2) The following data is the frequency distribution of the number of orders received each day
during the past 50 days at the office of a mail-order company.
Number of Orders
10 12
13 15
16 18
19 21
f
4
12
20
14
Calculate the variance and standard deviation.

f
Number of Orders
10 12
13 15
16 18
19 21
fx
(midpoint)
4
12
20
14
fx
11
fx 2
fx
3.2.2
Coefficient of variation (relative dispersion)
Large standard deviations mean large variability within the dataset.

Coefficient of variation (CV) is very useful to compare 2 distributions in terms of
their standard deviations expressed as percentages of the mean.
Can be used to compare the variation in 2 sets of data of different units of
measurements.
CV =
s
100
x
The larger the percentage of CV, the greater the variation.

Larger variation
less consistency and more dispersed
Smaller variation
better consistency and less dispersed
Example : If CV distribution A is greater than CV distribution B, the
distribution A is more dispersed or more spread than distribution B.
Means that distribution B is more consistent or less dispersed or
more stable than distribution A.
EXAMPLE
A: x = 60
s=5
B: x = 70
s=6
Which one is more consistent?

A: CV =
5
60
x 100
= 8.33%
B: CV =
6
70
x 100
= 8.57%
A is more consistent than B.

B is more dispersed than A.
12
3.2.3
Skewness
Measure of skewness
It is used to determine the difference between the mean and mode of the distribution.
If (mean mode) = +ve (positive)
The distribution is skewed to the right or positively-skewed.
If (mean mode) = -ve (positive)

The distribution is skewed to the left or negatively-skewed.
If (mean mode) = 0
The distribution is symmetrical.
Pearson Coefficient of Skewness
To measure the degree of skewness (to determine the shape of the distribution)
PCS =
OR
PCS =
3()

If PCS = 0
If PCS = +ve value
If PCS = -ve value
the distribution is symmetrical

skewed to the right or positively skewed
skewed to the left or negatively skewed
13
EXERCISE
1.
The following table shows the score of Efficiency Test for 100 new factory operators.
Test score
20 and less than 30

30 and less than 40
40 and less than 50
50 and less than 60
60 and less than 70
70 and less than 80
80 and less than 90
90 and less than 100
No. of
operators
6
11
18
25
17
13
7
3
fx
fx 2
a) Calculate the mean and standard deviation for the score obtained by the operators. Then,
explain the meaning of the mean value obtained.
b) Calculate the median score and explain its meaning.
c) What is the score obtained by majority of the operators? Give comment.
d) Determine the shape of the distribution by calculating the Pearsons Coefficient of Skewness.
14
e) Given the mean and the variance of the scores of the senior operators were 55 and 144
respectively, which group of operators obtained more consistent score?
2a)
The data below shows the number of accidents in Johor Bahru for the year 2012.
26
24
29
32
28
35
24
23
30
37
27
31
i)
Calculate the mean, mode, and median for the above data. Hence, determine the
shape of the distribution.
2b)
Carlos Company tested 20 types of engine to see how many hours they would last. The data
is shown below:
i)
Hours
Frequency
5<10
6
10<15
4
15<20
3
20<25
2
25<30
5
Find the mean and standard deviation for above distribution.
ii)
Draw an ogive for the data and estimate the median value.
iii)
Determine the skewness of the above data using the Pearson Coefficient of
Skewness.
iv)
It is found that Kenji Company also tested the same engine and has the mean and
variance of 19 and 36 respectively. Which company has more consistent hours in
testing the engine?
15
3a)
The number of low birth weight infants for the past 12 years from NCB specialist is as
follows:
35
b)
37
30
41
32
24
46
52
27
49
51
32
i)
Calculate the mean and mode.
ii)
Calculate the standard deviation.
iii)
Calculate the coefficient of skewness and comment on the shape of distribution.
The following table shows the distribution of time to repair an electronic gadget for 50
gadgets chosen at random from Samsung Repair Shop.
i)
ii)
Repair Time (minutes)

Number of gadgets
10 - 15
5
15 20
8
20 25
10
25 30
12
30 35
9
35 - 40
6
Find the mean and standard deviation for the repair time.
It is reported that Sony Repair Shop has a mean of 25.05 minutes and its standard
deviation is 4.17 minutes on time to repair electronic gadget. Determine which
repair shop has a more consistency in repair time.
16
4)
The following data are the number of cupcakes sold weekdays in February 2013 by Balqis
Shop.
48
48
58
50
35
47
75
46
39
35
56
66
33
43
65
37
i)
Construct a stem-and-leaf plot.
17
60
52
67
68

Data Description

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Description

Uploaded by

Copyright:

Available Formats

CHAPTER 3

NUMERICAL DESCRIPTIVE MEASURES

MEASURES OF CENTRAL TENDENCY

Usually called the average

where n = total frequency

12 and less than 15

: lower median class boundary

: frequency for the median class

Example 1: Find the median for sets of data.

(b) 1.23, 4.56, 3.45, 2.21, 3.33, 2.34, 4.56

The value that is repeated most often in a data set

- To find mode, select class with the highest frequency

L : lower boundary of the class containing mode

Example 1: Find the mode.

Relationship among mean, median and mode

(i) If mean = median = mode

(ii) If mean > median > mode

(iii) If mean < median < mode

A measure of dispersion measures the spread or variability of the data distribution.

Variance and standard deviation

Variance and standard deviation for ungrouped data

Population Standard Deviation

Sample Standard Deviation

x . The value of x is obtained by squaring

is not the same as

Find the variance and standard deviation for these data.

Variance and standard deviation for grouped data

Population Standard Deviation

Sample Standard Deviation

Calculate the variance and standard deviation.

Calculate the variance and standard deviation.

Coefficient of variation (relative dispersion)

Large standard deviations mean large variability within the dataset.

The larger the percentage of CV, the greater the variation.

Which one is more consistent?

A is more consistent than B.

If (mean mode) = -ve (positive)

the distribution is symmetrical

20 and less than 30

b) Calculate the median score and explain its meaning.

c) What is the score obtained by majority of the operators? Give comment.

Calculate the mean and mode.

Calculate the standard deviation.

Calculate the coefficient of skewness and comment on the shape of distribution.

Repair Time (minutes)

Construct a stem-and-leaf plot.

You might also like