I Am Sharing 'Topic 1 A ' With You 230824 191909

Subtopics
❑ Representation of Data
▪ Histogram
▪ Cumulative Frequency Curve (Ogive – “less than”)
▪ Stem and leaf display
❑ Measures of Central tendency

▪ Mean
▪ Mode
▪ Median
2
Introduction to Statistics
Statistics may mean any of the following:
1) Numerical facts
Example: Total students enrolled in UTAR over
the years 2000-2003.
2) Measures based on sample data
Example: A sample mean is known as a statistic.
3) Field or discipline of study
- Concerned with scientific techniques used for
collecting, organizing, summarizing, presenting and
analyzing data; drawing valid conclusions and
making decisions based on such analysis.
3
Types of Statistics
Descriptive Inferential
statistics statistics
Consists of methods for Consists of methods that use

organizing, summarizing, sample results to help to make
presenting and analyzing data decisions or predictions about a
by using tables, graphs, and population.
summary measures. Also called inductive statistics.
4
Population Versus Samples
Population Sample
• Consists of all elements • A portion of the population
(individuals, items or objects) selected for study.
whose characteristics are being
• Sample survey : Collecting
studied.
information from a portion of
• Census : Collecting the target population.
information from every member
• Statistical measures obtained
of the target population.
from sample data are called
• Statistical measures obtained sample statistics.
from population data are called
parameters.
5
Text Book
Figure 1.1 Population and Sample
Prem Mann, Introductory Statistics, 7/E

Copyright © 2010 John Wiley & Sons. All right reserved 6
Why is a Sample used?
Most of the time we cannot study the entire population,

so we must use a sample as a guide because :
✔It would take too much time to study the entire
population
✔It would cost too much money to study the entire
population
✔It might not be possible to identify all the members of
the population
7
Types of Variables
Discrete
data
8
Types of Variables
9
Types of Variables
10
Example 1.1
1. Explain whether each of the following constitutes a
population or a sample.
(a) Scores of all students in a statistics class

(b) Yield of potatoes per acre for 10 pieces of land
(c) Weekly salaries of all employees of a company
(d) Cattle owned by 100 farmers in Kedah.
(e) Numbers of computers sold during the past week at
all computer stores in Los Angeles
11
Example 1.2
Based on the following statements, determine whether the
data obtained is discrete or continuous data.
(a) Age of a person.
(b) Result obtained when a fair die is thrown.
(c) Time (in minutes) taken to run 100 meters.
(d) Average monthly expenditure (in RM) on household
goods.
(e) Number of robberies reported per day.
(f) Diameter (in nearest cm) of a tennis ball.
12
Example 1.2 (solution)
(a) Age of a person. Continuous data
(b) Result obtained when a fair die is thrown.

Discrete data
(c) Time taken (in minutes) to run 100 meters.

Continuous data
13
(d) Average monthly expenditure (in RM) on household
goods. Continuous data
(e) Number of robberies reported per day.

Discrete data
(f) Diameter (in nearest cm) of a tennis ball.

Discrete data
14
Presentation Of Data
Raw data
Data recorded in the sequence in which they are collected
and before they are processed or ranked
(a) Qualitative data

(b) Quantitative data
collected on a
collected on a
qualitative variable
quantitative variable
(nonnumeric
(measure numerically)
categories)
15
Organizing and Graphing
-Data organized into tables, and displayed in graphs and charts
Qualitative data Quantitative data
a) Frequency distribution a) Stem and leaf displays
b) Relative Frequency and b) Frequency distribution
percentage distributions
c) Relative Frequency and percentage
distributions
d) Graphing frequency distribution:
- Histogram
- Shape of histogram
e) Cumulative frequency distribution
f) Graphing cumulative frequency distribution
- Ogive or cumulative frequency
curve/polygon.
16
Quantitative
data
17
Stem plots (Stem and
leaf diagram)
18
Stem and Leaf Plots
• A simple graph for quantitative data
• Uses the actual numerical values of each data point.
– Divide each measurement into two parts:

the stem and the leaf.
– List the stems in a column, with a vertical line to their
right.
– For each measurement, record the leaf portion in the
same row as its matching stem.
– Order the leaves from lowest to highest in each stem.
– Provide a key to your coding.
19
Example 1.3
The marks of 24 students on a statistics test:
52 65 53 42 85 57 76 69 44 57 60 67
65 74 72 53 81 68 51 62 87 56 90 70
Draw a stem plot for the marks of these students.
20
52 65 53 42 85 57 76 69 44 57 60 67
65 74 72 53 81 68 51 62 87 56 90 70
42 44
52 53 57 57 53 51 56
65 69 60 67 65 68 62
76 74 72 70
85 81 87
90
21
Stem and Leaf display:
Stem Leaf Stem Leaf
4 2 4 4 2 4
5 2 3 3 7 1 7 6 5 1 2 3 3 6 7 7
6 5 5 8 9 2 0 7 6 0 2 5 5 7 8 9
7 4 2 6 0 7 0 2 4 6
8 5 1 7 8 1 5 7
9 0 9 0
Key: 4|2 = 42 marks
22
Example 1.4
The monthly electricity bill (in RM) paid by a sample of 24
households selected from a city :
192 235 253 302 455 157 156 549 244 257 350 447
155 154 352 353 410 148 151 502 247 356 246 244
Draw a stem plot for the data.
23
192 235 253 302 455 157 156 549 244 257 350 447
155 154 352 353 410 148 151 502 247 356 246 244
192 157 156 155 154 148 151

235 253 244 257 247 246 244
302 350 352 353 356
455 447 410
549 502
24
Stem Leaf Stem Leaf
1 92 55 54 57 48 56 51 1 48 51 54 55 56 57 92
2 35 53 44 47 57 46 44 2 35 44 44 46 47 53 57
3 52 02 53 56 50 3 02 50 52 53 56
4 55 10 47 4 10 47 55
5 49 02 5 02 49
Key: 4|55 = RM 455 25

Example 1.5
The heights of students (to nearest cm) in a class is given
below:
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
Draw a stem plot for the heights of these students using

a class interval of 2 cm.
26
Smallest value = 142
Highest value = 157.
If the class interval of 2cm is used, the following class

intervals are obtained:
142-143, 144-145, 146-147, 148-149,

150-151, 152-153, 154-155, 156-157.
27
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
142-143 : 142
144-145 : 145, 144, 144
146-147 : 147, 147, 146
148-149 : 149, 148
150-151 : 150, 151, 151
152-153 : 152, 152, 153, 152
154-155 : 153, 155, 155, 154
156-157 : 157, 156, 157, 156
28
Stem Leaf Stem Leaf
14 2 14 2
14 5 4 4 14 4 4 5
14 7 7 6 14 6 7 7
14 8 9 14 8 9
15 0 1 1 15 0 1 1
15 2 3 2 3 2 15 2 2 2 3 3
15 5 5 4 15 4 5 5
15 7 6 7 6 15 6 6 7 7
Key: 14|2 = 142 cm 29
Example 1.6
For the data in Example 1.5, draw a stem and leaf
diagram using the following class intervals:
142-144, 145-147, 148-150,

151-153, 154-156, 157-159.
30
The interval 148-150 cannot be represented by the ‘stem’
14 because the ‘tens’ digit changes in this interval.
Therefore, ‘stem’ 142, 145, 148, 151, 154 and 157 are
used. The ‘leaf’ is the value that is added to the ‘stem’.
31
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
142-144 : 142, 144, 144

145-147 : 145, 147, 147, 146
148-150 : 149, 150, 148
151-153 : 152, 153, 152, 153, 151, 151, 152
154-156 : 155, 156, 155, 154, 156
157-159 : 157, 157
32
Stem Leaf
142 0 2 2
145 0 1 2 2
148 0 1 2
151 0 0 1 1 1 2 2
154 0 1 1 2 2
157 0 0
Key: 142|2 means 142+2= 144cm

33
Ungrouped data
and grouped data
34
Ungrouped data
1. Raw Data
Example:
A survey on the number of male children in 20 families:
1 4 2 0 2 3 3 2 1 4
5 2 1 2 0 1 2 3 1 2
The above data is called raw data.
35
Ungrouped data
2. Array
• An arrangement of quantitative raw data in
ascending or descending order.
Example:
Number of male children in 20 families in ascending
order:
0 0 1 1 1 1 1 2 2 2
2 2 2 2 3 3 3 4 4 5
The above data is called an array.

36
Summary of Ungrouped data and
Grouped data
37
Number of Number of
male children family
0 10 Height (cm) Number of
1 12 children
2 8 100 - < 105 10
3 6 105 - < 110 12
4 3 110 - < 115 8
5 1 115 - < 120 6
120 - < 125 3 38
Relative Frequency and Percentage
The relative frequencies and percentages for a
quantitative data set are obtained as follows:
Relative frequency of a category
frequency of that category f
= =
sum of all frequencies f
Percentage = relative frequency  100%
39
Grouped frequency distribution: Class limits
and Class Boundaries
• Class limits: The smallest and largest possible
measurements in each class, that is, the lower and upper
limits of each class.
• Class boundaries: The dividing lines between successive
classes.
• Class Boundary (discrete data): Given by the midpoint
of the upper limit of one class and the lower limit of the
next class.
• Class Boundary (continuous data) : Corresponds to the
upper limit of one class or the lower limit of the next
class. 40
Frequency distribution: Class width
• The difference between the two boundaries of a class
gives the class width. The class width is also called the
class size or class interval.
Class Width = Upper boundary – Lower boundary
• The class width can also be determined by finding the
difference between the lower limit of the next class and
the lower limit of the class.
Class Width =
Lower limit of the next class – Lower limit of the class
41
Frequency distribution:
Class midpoint
• The midpoint of each class is called class midpoint or
class mark. It lies half-way between the class limits or the
class boundaries.
42
Example 1.7 (a) (Discrete Data)
43
Example 1.7 (b) (Continuous Data)
The following table gives the frequency

distribution of ages for all 50 employees of a
company.
Age No. of Employees

18 to less than 30 12
44
Example 1.7 (b) (Continuous Data)
Class Class Midpoint, Frequency, Relative Percentage

Age
boundaries width, c m f frequency (%)
12
18 – 30 18 – 30 12 24 12 = 0.24 24
50
19
30 – 42 30 – 42 12 36 19 = 0.38 38
50
14
42 – 54 42 – 54 12 48 14 = 0.28 28
50
5
54 – 66 54 – 66 12 60 5 = 0.10 10
50
Sum = Sum =
 f = 50
1.00 100%
45
Summary of Class Boundaries
Continuous
Class
Continuous
Boundaries Equivalent to
Grouped Data, x
0 − 20 0 – 20
20 − 40 20 – 40
Discrete
Discrete Class
Equivalent to
Grouped Data, x Boundaries
1 − 20 0.5 – 20.5
21 − 40 20.5 – 40.5
Example 1.8
The table below shows the class boundaries, class width and
class midpoints for the grouped frequency distribution of
weekly sales in units.
Sales (units) Class boundaries Class width, Class midpoint,
Class limits c m
110 – 129
130 – 149
150 – 169
170 – 189
190 – 209
47
Example 1.9
The table below shows the class boundaries, class width and
class midpoints for the grouped frequency distribution of
age of employees in years.
Age (years) Class Class Class
Class limits boundaries width, c midpoint, m
25 – < 30
30 – < 35
35 – < 40
40 – <45
45 – < 50
48
Histogram
▪ A diagrammatic presentation of a frequency
distribution.
▪ Histogram for Frequency Distribution
- All bars are of same width.
- Height of every bar is proportional to the frequency
of the corresponding class.
▪ A histogram is a graph in which class boundaries

are marked on the horizontal (x) axis and the
frequencies, relative frequencies, or percentages
are marked on the vertical (y) axis.
49
Example 1.10
The table below shows the times (in minutes) taken by 60
students to complete a model in a competition.
Time 30– 35– 40– 45– 50– 55– 60–
( minutes) <35 <40 <45 <50 <55 <60 <65
Frequency 2 17 18 13 6 1 3
Illustrate the above data with a histogram.
50
Time (minutes), Class boundaries Frequency, f
30–35 2
35–40 17
40–45 18
45–50 13
50–55 6
55–60 1
60–65 3
Note: Since the distribution is of equal class size, a 2-column
table showing class boundaries and frequency is
51
required for drawing the histogram.
Histogram: Times (in minutes) taken by 60
complete a model in a competition
students to
52
30 35 40 45 50 55 60 65
“Less than” Cumulative
Frequency Curve
▪ A graphical presentation of a “less than” cumulative
frequency distribution is called a “less than” ogive or
“less than” cumulative frequency curve/polygon.
▪ A “less than” ogive is a graph showing the cumulative

frequency less than the upper class boundary of a class
plotted against the upper class boundary of the class.
53
“Less than” Cumulative
Frequency Curve
An ogive for the cumulative frequency distribution can be
presented in two forms:
• As a smooth curve – by drawing a smooth curve passing

through the dots marked above the upper boundaries of
classes at heights equal to the cumulative frequencies of
respective classes.
• As a polyline between points – by joining with straight

lines the dots marked above the upper boundaries of
classes at heights equal to the cumulative frequencies of
the respective classes. 54
Example 1.11
Construct ‘less than’ cumulative frequency distribution and
state upper class boundaries.
Marks Number of students (Frequency)
40 – 49 2
50 – 59 10
60 – 69 18
70 – 79 13
80 – 89 5
90 – 99 2
Total 50
55
Discrete variable :
‘Less than’
Upper class
Class Boundaries cumulative
boundary
frequency
<39.5 0
39.5 – 49.5 < 49.5 2 (0+2)
49.5 – 59.5 < 59.5 12 (2+10)
59.5 – 69.5 < 69.5 30 (12+18)
69.5 – 79.5 < 79.5 43 (30+13)
79.5 – 89.5 < 89.5 48 (43+5)
89.5 – 99.5 < 99.5 50 (48+2)
56
Example 1.12
Construct ‘less than’ cumulative frequency distribution and
state upper class boundaries for the completion times taken
by all 120 workers to complete a standard task in a factory.
Completion time (minutes) Number of workers

10 – less than 12 9
57
Continuous variable :
Class
Boundaries Upper class ‘Less than’
boundary cumulative frequency
< 10 0
10 - 12 < 12 9 (0+9)
12 - 14 < 14 38 (9+29)
14 - 16 <16 80 (38+42)
16 - 18 <18 106 (80+26)
18 - 20 < 20 120 (106+14)
58
Example 1.16Example
(a) 1.13
Construct a “less than” cumulative frequency distribution
with upper class boundaries and draw a cumulative
frequency curve based on the following information .
The weights of 20 students (in nearest kg).
Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
Number of 3 4 5 6 2
students
Estimate from the ogive,
(i) the total number of students of weight less than 67 kg.
(ii) the value of x, if 20 % of the students were of weight x kg or more.
(iii) the number of students of weight 64 kg or more.
(iv) the minimum weight of the heavier 10% of the students in a group.
59
The table for “less than” cumulative frequency distribution
and the upper class boundaries.
Class Upper class Cumulative
Boundaries boundary Frequency
<59.5 0
59.5 – 62.5 <62.5 3
62.5 – 65.5 <65.5 7
65.5 – 68.5 <68.5 12
68.5 – 71.5 <71.5 18
71.5 – 74.5 <74.5 20 60
“Less than” ogive for the weight of 20 students (nearest kg)
61
“Less than” ogive for the weight of 20 students (nearest kg)
Cumulative Frequency
20
18
16
14
12
10 (i) 9
8
6
4 (iii)
(iv) 71.5 kg
2
(ii) 70 Weight (kg)
0
59.5 62.5 65.5 68.5 71.5 74.5 77.5
62
From the ogive:
( i ) Number of students with weight less than 67 kg = 9
80
( ii ) Position of x =  20 = 16th
100
x = 70
( iii ) 5 students weigh less than 64 kg, therefore
20 − 5 = 15 students weigh more than or equal to 64 kg.
( iv ) 10% of the students have the heaviest weight.
10
 20 = 2 students ( last 2 students )
100
Position on cumulative frequency = 20 − 2 = 18th
2 students have at least 71.5 kg. 63
Example 1.14
The table below shows the bonuses given out to 250 employees of a factory
in one particular year.
Bonus (RM) Number of employees
40 ≤ x < 50 3
50 ≤ x < 60 8
60 ≤ x < 80 27
80 ≤ x < 100 75
100 ≤ x < 120 79
120 ≤ x < 150 44
150 ≤ x < 200 14
i.Construct a “less than” cumulative percentage distribution with upper
class boundaries and draw a cumulative percentage ogive.
ii. Estimate the number of employees receiving at least RM115 as bonus.
Ans: (b) 80 employees
64
65
(i) Bonuses (RM) for 250 employees
66
Example1.14
Example 11 (solution)
Solution
(b) The number of employees receiving at least RM115 as

bonus:
67
68
Measure Of Central Tendency
69
Measure of Central Tendency
• A measure of central location for a data set and

can be used as a summary value for that data set.
• There are three measures of central location.

They are:
1) Mean 2) Median 3) Mode
70
Mean
Mean is the average of values. It is also known as
arithmetic mean.
▪ Mean does not necessary correspond to one of the

values in the original data
▪ Mean is influenced by the extreme values/outliers
(values that are very small or very large relative to
the majority of the values in a data set).
▪ Mean is not suitable to be used in a data set that
contains extreme values
71
Mean of Ungrouped data
Mean for population data:
Mean for sample data:
Where,
72
Example 1.15
Find the mean of the set of the numbers
{ 12,18, 13, 10, 6, 23, 16}.
Solution :
73
Mean of Ungrouped data with frequency
Where,
74
Mean of Grouped data
Where,
75
Example 1.16
x 4 5 7 10 11 15 17
f 3 12 23 10 14 8 2
Calculate the mean for the sample data in the frequency

table above.
Ans: 8.9028
76
x f fx
4 3 12
5 12 60
7 23 161
10 10 100
11 14 154
15 8 120
17 2 34
 fx = 641 77
Mean,
78
Example 1.17
Calculate the sample mean of the following grouped
frequency distribution and interpret the value.
Sales (units) Frequency, f
1–4 5
5–8 13
9 – 12 31
13 – 16 19
17 – 20 8
21 – 24 4
Ans: 11.7 units 79
Sales Mid-point , f fm
(units) m
1–4 2.5 5 12.5
5–8 6.5 13 84.5
9 – 12 10.5 31 325.5
13 – 16 14.5 19 275.5
17 – 20 18.5 8 148.0
21 – 24 22.5 4 90.0
80
Mean,
Interpretation : The average number of sales is

11.7 units.
81
Median
Median is the middle value of a data set after the
data is arranged in ascending or descending order.
- Median is not influenced by extreme values.
- Being the middle value implies that 50% of
the observations will be less than
the median and 50% of them will be more
than the median.
82
Ungrouped data:
Median position =
where n = number of observations
Grouped data (Grouped frequency distribution):
Median position =
83
Example 1.18 (Median for Ungrouped data)
Find the median for the set of data shown below and
interpret the value.
75, 67, 48, 66, 89, 51, 70
Ans: 67
84
75, 67, 48, 66, 89, 51, 70
Arrange the numbers in ascending order, which is

48, 51, 66, 67, 70, 75, 89
Median position =
Median = 67
Interpretation:
About 50% of the data is less than 67 and about 50% of
the data is more than 67.
85
Example 1.19 (Median for Ungrouped data)
Find the median for the following data :
65, 75, 20, 63, 42, 51, 39, 25
Ans: 46.5 86
Arrange the numbers in ascending order, that is
20, 25, 39, 42, 51, 63, 65, 70
Median position = (8+1)/2

=4.5th
87
Median for Grouped data (Multi-value
grouping)
The median is the th observation and it
can be estimated by using the following steps.
1. Find the median class
1. Determine the total frequency before the median class.
3. Use the method of proportion to calculate the median.

88
In general, by proportion:
1
2
(  f ) −  f M −1
Median M = LM + c
fM
where LM = lower boundary of median class,

 f = cumulative frequency before median class,
M −1
fM = frequency of median class,

c = width of median class.
89
Example 1.20
Find the median for the data in the following grouped
frequency distribution and interpret the value.
Number of
defectives 1-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16
(units)
Number of
weeks 5 8 22 18 36 29 20 12
(Frequency)
Ans: 9.72 units

90
Cumulative
Number of defectives, Frequency,
frequency,
Class boundaries f
F
0.5 – 2.5 5 5
2.5 – 4.5 8 13
4.5 – 6.5 22 35
6.5 – 8.5 18 53
8.5 – 10.5 36 89
10.5 – 12.5 29 118
12.5 – 14.5 20 138
14.5 – 16.5 12 150 91
n = 150
Median position = = 75th observation
Median class boundaries: 8.5 – 10.5
Lower boundary of median class, LM = 8.5

Cumulative frequency before median class,  f M −1 = 53
Frequency of median class, fM = 36
Width of median class, c = 2
92
Interpretation:
About 50% of the weeks with number of defectives less
than 9.72 units and about 50% of the weeks with number
of defectives more than 9.72 units. 93
Example 1.21
Find the median for the data in the following grouped
frequency distribution.
Ans: 21.51 minutes 94

Ans: 21.51 minutes 95

Total frequency, = 284

Median position = (284/2)th = 142th observation
Median class boundaries : .
Lower boundary , LM = 20
Cumulative frequency before the median class, f M −1
=123
Frequency of the median class, fM = 63
Width of the median class, c = 25 - 20 = 5
96
97
Example 1.22
Determining Median Using Ogive
100 earthworms were collected from a garden. The length
(to the nearest mm) of the earthworms is recorded as shown
in the table below.
Length (mm) 95 110 125 140 155 170 185 200
- - - - - - - -
109 124 139 154 169 184 199 214
Number of 2 8 17 26 24 16 6 1
earthworms
Draw a ‘less than’ cumulative frequency curve for the

information. Estimate the median length of the worms.
Ans: 153 mm 98
Length of earthworms (mm) Cumulative
(upper class boundary) frequency, F
< 94.5 0
< 109.5 2
< 124.5 10
< 139.5 27
< 154.5 53
< 169.5 77
< 184.5 93
< 199.5 99
< 214.5 100
99
Median position =
=
=
50th
100
“Less than” cumulative curve: Length of 100 earthworms
Median = 153 mm
101
Mode of Ungrouped data
Mode is the value of observation that occurs most
• Mode may not exist and if it does, it may not be

unique.
• Example : Mode of ungrouped data
• Raw data: 2, 4, 9, 8, 8, 5, 3
– The mode is 8, which occurs twice (unimode)
• Raw data: 2, 2, 9, 8, 8, 5, 3
– Two modes—8 and 2 (bimodal)
• Raw data: 2, 4, 9, 8, 5, 3
– No mode (each value is unique). 102
Mode of Grouped data
An estimate of the mode can be obtained from
the modal class (The class which has the largest
standard frequency).
There are 2 methods:

1) Using Calculation
2) Using histogram
103
104
fm = frequency of the modal class
fb = frequency before the modal class
fa = frequency after the modal class
Lm = lower boundary of the modal class,
c = width of the modal class.
105
Example 1.23
The marks obtained by 134 students in an examination
are recorded in the following table.
Marks 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 22 18 22 24 14 14 20
(a) Draw a histogram to represent the above
information.
(b) Estimate the mode
(i) from the histogram;
(ii) using the formula.
(c) Interpret the mode.
106
(a)
Marks
Frequency , f
(Class boundaries)
19.5 – 29.5 22
29.5 – 39.5 18
39.5 – 49.5 22
49.5 – 59.5 24
59.5 – 69.5 14
69.5 – 79.5 14
79.5 – 89.5 20
107
Histogram: Marks obtained by 134 students in an examination
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5

108
(b)(i)Estimated mode = 52.5 marks
b)(ii) Calculation Method
Modal class boundaries = 49.5 – 59.5
Lower boundary of modal class, Lm = 49.5
Frequency of modal class, fm = 24
Frequency before modal class, fb = 22
Frequency after modal class, fa = 14
Width of modal class, c = 59.5 – 49.5 =10
109
c) Interpretation
Most of the 134 students scored about 52 marks in the

examination.
110
Example 1.24
The frequency table below shows the mass of mangoes
(in gm) collected from the farm of Mr Nazri Adam
during the mango season. Class a – b shows the
interval a mass < b
100 125 150 175 200
Mass (gm) - - - - -
125 150 175 200 225
Number
28 75 42 26 10
of mangoes
Calculate the mode of the mangoes.

Ans: 139.688 gm 111
Modal class boundaries = 125 – 150

Lower boundary of modal class, Lm = 125
Frequency density of modal class, = 75
Frequency density before modal class, = 28
Frequency density after modal class, = 42
Width of modal class, c = 150 – 125 = 25
112
Subtopics
❑ Measures of Dispersion
▪ Range
▪ Variance
▪ Standard Deviation
❑ Measures of Position
▪ Quartiles
❑ Box Plots
2
Measures Of Dispersion
• Types of measurement which provide
information on the spread or variability of a
set of data.
• There are
1) Range
2) Variance
3) Standard Deviation
3
EXAMPLE : The following are two data sets on the ages of all
workers in each of two small companies.
Company 1 : 47 38 35 40 36 45 39
Company 2 : 70 33 18 52 27
The mean age of workers in both companies are the same, 40
years. If we are not provided with the ages of individual workers
in these two companies and are only told that the mean age of
workers in both companies are the same, we may deduce that the
workers in these two companies have the similar age distribution.
But, as we can observe, the variation in the workers’ ages for the
two companies are very different. As illustrated in the diagram,
the ages of the workers in the second company have a much
larger variation than the ages of the workers in the first company. 4
36 39
Company 1 :
35 38 40 45 47
Company 2 :
18 27 33 52 70
Conclusion:
Two data sets can have the same measures of central tendency,
and yet they can still be very different on the variability of values.
A measure of dispersion is used to describe such difference
quantitatively. 5
Range for Ungrouped data
Largest Smallest
Range =
observation − observation
6
Range
Disadvantage :
1) Range is not a good measure of dispersion
for a data set that contains outliers.
2) Its calculation is based on two values only;
the largest and the smallest. All other
values in a data set are ignored. Thus,
range is not a very satisfactory measure of
dispersion.
7
Standard Deviation
The value of standard deviation tells how closely
the values of a data set are clustered around the
mean.
Population standard deviation = 
Sample standard deviation = s
• Standard deviation for ungrouped data

• Standard deviation for grouped data
8
Variance
Variance is the square of standard
deviation.
Population variance =  2
Sample variance = s2
• Variance for ungrouped data

• Variance for grouped data
9
Population Variance and Standard
Deviation For Ungrouped Data
Where,
 = population mean of data
N = total number of observations (population size)
10
Sample Variance and Standard Deviation
For Ungrouped Data
Where,
x = sample mean of data
n = total number of observations (sample size)
11
Example 1.25
Find the variance and the standard deviation
for the following set of sample data.
{ 4, 5 , 6 , 7 , 8 , 9 ,10}
12
Example 1.25 (using original formula)
Solution
x x−x (x − x ) 2
4 -3 9
5 -2 4
6 -1 1
7 0 0
8 1 1
9 2 4
10 3 9
 x = 49  ( x − x ) 2
= 28
13
Example 1.25 Solution
Mean , x=  x
=
49
=7
n 7
Sample variance , s 2
=
 ( x − x) 2
=
28
= 4.667
n −1 6
Sample standard deviation, s = s 2

= 4.667 = 2.160
14
Example 1.25 (using alternative formula)
Solution
2
x x
4 16
5 25
6 36
7 49
8 64
9 81
10 100
 x = 49  = 371
x 2 15
(  x)
2
x 2
−
n
Sample variance , s 2
=
n −1
2
(49)
371 −
= 7
6
= 4.667
Sample standard deviation, s = 4.667 = 2.160
16
Example 1.26
Find the standard deviation for the
population data.
3, 5, 6, 4, 6, 5, 6, 8, 5
17
Number of observations,
x x2
N= 9
3 9
Standard deviation,
5 25
x x
2 2
6 36
= −   4 16
N  N  6 36
2
5 25
272  48 
= −  6 36
9  9  8 64
= 1.333 5 25
 x = 48  = 272
x 2 18
Population Variance For Grouped
Data (Frequency Distribution)
where,  = population mean of the data

f = frequency of the class
19
Sample Variance and Standard
Deviation For Grouped Data
Where,
m = midpoint of the class
x = mean of the data
f = frequency of the class
n= f 20
Example 1.27(a)
The grouped frequency distribution below shows
the number of sales made by all the salesperson
of a company in one particular month. Find the
mean and standard deviation.
Sales (units) Frequency
0–9 5
10 – 19 13
20 – 29 23
30 – 39 31
40 – 49 16
21
Example 1.27(a) Solution
Mid-
Sales Frequency,
point, fm fm 2
(units) f
m
0-9 4.5 5 22.5 101.25
10 - 19 14.5 13 188.5 2733.25
20 - 29 24.5 23 563.5 13,805.75
30 - 39 34.5 31 1069.5 36897.75
40 - 49 44.5 16 712 31684
f = 88  fm = 2556  fm2 = 85222

Mean , =  fm
f
2556
=
88
= 29.05 units
Standard deviation ,
 fm   fm 
2
2
= − 
f  f 
 
2
85222  2556 
= − 
88  88 
= 11.17 units 24
Example 1.27(b)
The following data give the frequency distribution of the
number of orders received each day during a sample
period of 50 days at the office of a mail-order company.
Number of Orders Number of Days

10−12 4
13−15 12
16−18 20
19−21 14
Calculate the variance and standard deviation.
Example 1.27(b) Solution
Number Class mid- Number of

fm fm2
of Orders point, m Days, f
10−12 11 4 44 484
13−15 14 12 168 2352
16−18 17 20 340 5780
19−21 20 14 280 5600
f = 50 fm = 832  fm = 14216
2
(  fm )
2
(832) 2
 fm 2
−
n
14216 −
50 = 7.582
s2 = =
n −1 49
s = s 2 = 7.582 = 2.754
Formula List
27
Data Distribution:
Symmetry and Skewness
• If a distribution is represented by a histogram
or a frequency curve, we can see the general
shape of its distribution and the relationship
between the mean, median and mode.
• There are 3 general shapes:

1) Symmetrical distribution
2) Positively skewed distribution
3) Negatively skewed distribution
28
Symmetrical Distribution
(bell shaped)
• Also known as normal distribution
Value of averages: Mode = Median = Mean
29
Positively Skewed Distribution
(Skewed to the right)
Value of averages: Mode < Median < Mean

30
Negatively Skewed Distribution
(Skewed to the left)
Value of averages: Mode > Median >Mean
31
Measures of
Location
32
What is measure of position ?
• A measure of position determines the position of a
single value in relation to other values in a sample
or a population data set.
• The three commonly used measures of position are

quartiles, percentiles, and percentile rank.
33
Quartiles
Quartiles divide a set of data (arranged in
ascending or descending order) into 4 equal
parts.
34
Inter-quartile Range and Semi-Inter-
quartile Range (or Quartile deviation)
Interquartile Range (IQR)
= Third Quartile − First Quartile
= Q3 − Q 1
Semi-interquartile range (or quartile deviation)

1
= ( Q3 − Q1 )
2
35
Example 1.28 (Quartiles For Ungrouped Data)
(a) Find the values of the three quartiles. Where does the number of
car thefts of 40,197 fall in relation to these quartiles?
(b) Find the interquartile range. 36
(a) Rank the data in increasing order.

12 + 1
Position of median = = 6.5th
2
Values less than median Values greater than median
11,669 13,435 14,413 18,103 18,215 21,088 26,343 29,920 33,956 40,197 40,769 42,082
14,413 +18,103 Q = 21,088 + 26,343Q = 33,956 + 40,197

Q1 = 2 3
2 2 2
=16,258 = 23,715.5 = 37, 076.5
Also the median

By looking at the position of 40,197, this value lies in the top
25% of the car thefts. 37
(b) The interquartile range is given the difference

between the values of the third and the first quartiles.
Thus,
IQR = Interquartile Range
= Q3 − Q1
= 37,076.5 −16, 258
= 20,818.5 car thefts
38
Example 1.29
The following are the ages of nine employees of
an insurance company:
47 28 39 51 33 37 59 24 33
(a) Find the values of the three quartiles. Where

does the age of 28 fall in relation to the ages of
these employees?
(b) Find the interquartile range.
39
(a) Rank the data in increasing order.

9 +1
(b) Position of median = 2 = 5
th
Values less than median Values greater than median

24 28 33 33 37 39 47 51 59
28 + 33 47 + 51
Q1 = Q2 = 37 Q3 =
2 2
= 30.5 = 49
Also the median
Thus the values of the three quartiles are
Q1 = 30.5 years Q2 = 37 years Q3 = 49 years
The age of 28 falls in the lowest 25% of the ages. 40

(b) The interquartile range is
IQR = Interquartile range

= Q3 − Q1
= 49 − 30.5
=18.5 years
41
Quartiles For Ungrouped frequency
distribution (single-value grouping)
1. Construct cumulative frequency.
(n + 1)
2. Median position = , locate the
median. 2
3. Q1 = Middle value of the first 50%.

4. Q3 = Middle value of the second 50%.
42
Example 1.30
Number of fishes 0 1 2 3 4 5
Frequency 1 5 8 7 3 1
The above data shows the number of fishes
reared in each of 25 houses along Green Road.
Find the median and semi inter-quartile range for
the data.
Ans: Q1 = 1.5; Q2 = 2; Q3 = 3; Semi-IQR = 0.75
43
Number of Cumulative
Frequency, f
fishes, x frequency, F
0 1 1
1 5 6
2 8 14
3 7 21
4 3 24
5 1 25
44
1
Median position = (25 + 1) = 13th
2
Median = 2 fishes
45
There are 12 values before median.

1
Q1 position = (12 + 1) = 6.5th
2
First quartile , Q = ( x + x )
1
1 6 7
2
1
= (1 + 2)
2
= 1.5 fishes
46
Calculate 6.5th value after median
Q3 position = (13 + 6.5) = 19.5th
Q3 = ( x19 + x 20 )
1
Third quartile,
2
1
= (3 + 3)
2
= 3 fishes 47
Semi-interquartile range = (Q3 − Q1 )

1
2
1
= (3 − 1.5)
2
= 0.75 fishes
48
Quartiles for Grouped Frequency
Distribution
For a grouped frequency distribution with total
frequency,  f
1 
First quartile, Q1 =   f th value
4 
3 
Third quartile, Q3 =   f th value
4 
49
Quartiles For Grouped Frequency
Distribution
Determine the class boundaries and compute the
cumulative frequency for each class. Locate the
classes that contain the quartiles by computing
their positions. Determine the quartiles using
formulae or graphically.
1
(a) Median position = 2
f
1
(b) Q1 position = 4
f
3
Q
(c) 3 position =
4
f
50
Formula For First Quartile
1 
4 (  f ) −  f Q1 −1 
Q1 = LQ1 +  c
 fQ1 
 
LQ1 = lower class boundary of the first quartile class
 fQ −1 = cumulative frequency before the first quartile
1
class
fQ = frequency of the first quartile class
1
c = width of the first quartile class

51
Formula For Third Quartile
1 
4 (  f ) −  f Q3 −1 
Q3 = LQ3 +  c
 fQ3 
 
LQ3 = lower class boundary of the third quartile class
 fQ −1 = cumulative frequency before the third quartile
3
class
fQ = frequency of the third quartile class
3
c = width of the third quartile class

52
Example 1.31
The following table shows the height distribution
for a group of students. Find the first quartile,
third quartile and semi-interquartile range.
150 155 160 165 170 175
Height
- - - - - -
(cm)
155 160 165 170 175 180
Frequency 15 32 68 52 24 12
53
Height (cm), Cumulative

Frequency
Class boundaries Frequency
150 – 155 15 15
155 – 160 32 47
160 – 165 68 115
165 – 170 52 167
170 – 175 24 191
175 – 180 12 203
54
 203 
Q1 position =   th = 50.75th
 4 
Q1 class boundaries = 160-165.
Lower boundary of 1st quartile class, LQ = 160
1
Cumulative frequency before 1st quartile class,  fQ −1 = 47

1
Frequency of 1st quartile class, fQ = 68

1
Width of 1st quartile class, c = 165 - 160 = 5
55
1 
 4 (  f ) −  fQ1 −1 
1st quartile, Q1 = LQ +  c
1
 fQ1 
 
1 
 4 ( 203) − 47 
= 160 +   ( 5)
 68 
 
= 160.28 cm
56
3
Q3 position = ( 203) th = 152.25th
4
Q3 class boundaries = 165 – 170.
Lower boundary of 3rd quartile class, LQ3 = 165

Cumulative frequency before 3rd quartile class,  fQ −1 = 115
3
Frequency of 3rd quartile class, fQ = 52

3
Width of 3rd quartile class, c = 5
57
1 
 4 (  f ) −  fQ3 −1 
3rd quartile, Q3 = LQ3 +  c
 fQ3 
 
3 
4 (203) − 115 
= 165 +   (5)
 52 
 
= 168.58cm
58
Semi-interquartile range
= (Q3 − Q1 )
1
2
1
= (168.58 − 160.28)
2
= 4.15cm
59
Find Quartiles graphically
(use ogive)
Median and Quartiles can be determined
directly from cumulative frequency curve
(ogive).
60
Example 1.32
The table below shows the distribution of the mass of
babies (in kg) for babies born in a hospital from January
to June. Draw an ogive to show the frequency
distribution. From your ogive, find the first quartile and
third quartile of the mass of the babies.
Mass 0.0 1.0 2.0 3.0 4.0 5.0

(kg) - - - - - -
1.0 2.0 3.0 4.0 5.0 6.0
Number 12 233 442 185 96 32
61
Mass (kg), Cumulative

Upper boundary frequency
< 0.0 0
< 1.0 12
< 2.0 245
< 3.0 687
< 4.0 872
< 5.0 968
< 6.0 1000
62
“Less than” Ogive: Mass of babies (kg)

Number of babies
1000
900
800
750th
700
600
500
400
300
250th
200
100
Q = 2.0kg
1
Q = 3.3kg
3 Mass (kg)
1 2 3 4 5 6
63
1
Q1 position = (1000) = 250th
4
3
Q3 position = (1000) = 750th
4
From the ogive:
First quartile, Q1 = 2kg
Third quartile, Q3 = 3.3kg
64
Box-plot
A Box-plot shows the spread of a distribution
and to detect outliers by using the 5-number
summary:
1) smallest value,
2) largest value,
3) first quartile ,
4) third quartile and
5) median.
It can be displayed horizontally or vertically. 65

Box-plot
Boxplots displayed horizontally
0 10 20 30 40 50 60
Smallest Median Largest

value value
1st 3rd
Quartile Quartile
66
Box-plot
Boxplots
Displayed
vertically
67
Box-plot
• ‘box’ starts from Q1 to Q3 and contains 50%

of the data in the middle of the distribution.
• ‘whisker’ starts from the box to the smallest

value and also from the box to the largest
value.
The ‘whisker’ displays the range of the data.
68
Constructing a Box-Plot
• Calculate Q1, the median, Q3 and IQR.
• Draw a horizontal line to represent the scale of
measurement.
• Draw a box using Q1, the median, Q3.
Q1 m Q3
69
• Isolate outliers by calculating:
Lower fence: Q1 – 1.5 IQR
Upper fence: Q3+1.5 IQR
• Measurements beyond the upper or lower fence
are outliers and are marked (*).
*
Q1 m Q3
70
• Draw “whiskers” connecting the largest and
smallest measurements that are NOT outliers
to the box.
*
Q1 m Q3
71
Box-plot
Boxplot for 3 types of distribution :
1) Symmetrical distribution
2) Positively skewed distribution
3) Negatively skewed distribution
72
Box-plot For Symmetrical
Distribution
‘whisker’ : same length

Median : centre of the box 73
Box-plot For
Positively Skewed Distribution
‘whisker’ : left side shorter than right side

Median : nearer to 1st quartile
74
Box-plot For
Negatively Skewed Distribution
‘whisker’ : left side longer than right side

Median : nearer to 3rd quartile
75
Use of box-plot to identify ‘outliers’
Sometimes, values which are unusually small or

large occur in a set of data.
The unusual values occur probably because of

an error in recording the data.
‘Outliers’ : points which are 1.5 times the

interquartile range more than the 3rd quartile or
less than the 1st quartile.
76
Outliers
1.5 (Q3 - Q1) 1.5 (Q3 - Q1)
* Boundary Boundary *
Q1 Q3
Last value Last value
inside inside
boundary boundary
‘Outlier’ ‘Outlier’
77
Example 1.33
Given the amount of sodium in 8 brands of cheese:
260 290 300 320 330 340 340 520
Q1 = 295 m = 325 Q3 = 340

Draw the boxplot based on the data.
IQR = 340 – 295 = 45
Lower fence = 295 – 1.5(45) = 227.5
Upper fence = 340 + 1.5(45) = 407.5
Outlier : x = 520
78
Blox plot : Amount of sodium in 8 brands of cheese.
*
200
m
Q1 Q3
79
Example 1.34
The following data shows a summary of the marks for
Mathematics and Science for students in a class.
Subjects Minimum Maximum Median First Third

quartile quartile
Mathematics 10 90 60 45 70
Science 35 85 60 48 72
Draw two boxplots for this data and give comments

regarding the distribution of marks for Mathematics and
Science.
80
Box plot: Marks for Mathematics and Science for

students in a class.
0 10 20 30 40 50 60 70 80 90 100
Mathematics
Science
The distribution of marks for Mathematics is skewed to the

left whereas the distribution of marks for Science is
symmetrical. 81
Example 1.35
The following stem plot shows the maximum
temperature for each day from 1st August to 23rd August
in a town. Draw a boxplot and use your boxplot to
identify the ‘outliers’.
Stem Leaf
7 6 7
7 0 2 2 3
6 5 7 8 8 8 9 9
6 2 3 3 4 4 4 4 4
5 9
5 1
Key : 5 | 9 means 590F 82
Number of observation , n = 23
Median position = 24/2 = 12th
Median, Q2 = 670F
Q1 position = 12/2 = 6th (in the values < median)
First quartile, Q1 = 640F
83
Q3 position = 6th (in the values > median)
Third quartile, Q3 = 700F
Upper boundary = Q3 + 1.5(Q3 – Q1)

= 70 + 1.5 (70 – 64) = 790F
Lower boundary = Q1 – 1.5(Q3 – Q1)
= 64 – 1.5(70 – 64) = 550F
Therefore, the outlier is 510F.
84
n = 23 Q = 70 C Upper boundary = 79 C

3
Q = 64 C IQR = 6 C Lower boundary = 55 C
1
Box plot: Maximum temperature for each day from 1st
August to 23rd August in a town
Outlier Boundary
55 64 67 77
50 60 70 80 90
59 Temperature, 0F 79
4th Industry Revolution (4th IR)
✓ The fourth industrial revolution is the
fusion of the real world with the
virtual world. The digital revolution is
marked by technology that takes
advantage of Big Data and Artificial
Intelligence (AI) to nurture automatic
learning systems.
✓ Artificial intelligence uses data and

mathematics and statistics to create
intelligent machines.
✓ Big Data is the collection and

analysis of data sets that are
complex in terms of the volume and
variety, and in some cases
incorporate the velocity at which
they are collected.
4th Industry Revolution (4th IR)
✓ Statistics method is applied to analyse
the large and complex data sets with
intelligent algorithms to spot patterns,
understand the relationships between
data for predict future outcomes and
make decisions.
✓ The links provided an overview of big

data and the role of statisticians in
understanding and advancing big data.
• https://www.researchgate.net/public
ation/284045063_Statistical_Perspe
ctives_on_Big_Data
• http://higherlogicdownload.s3.amazo
naws.com/AMSTAT/UploadedImage
s/49ecf7cf-cb26-4c1b-8380-
3dea3b7d8a9d/BigDataOnePager.p
df
88

I Am Sharing 'Topic 1 A ' With You 230824 191909

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

I Am Sharing 'Topic 1 A ' With You 230824 191909

Uploaded by

Copyright:

Available Formats

Subtopics

❑ Measures of Central tendency

Consists of methods for Consists of methods that use

Figure 1.1 Population and Sample

Prem Mann, Introductory Statistics, 7/E

Most of the time we cannot study the entire population,

(a) Scores of all students in a statistics class

(b) Result obtained when a fair die is thrown.

(c) Time taken (in minutes) to run 100 meters.

(e) Number of robberies reported per day.

(f) Diameter (in nearest cm) of a tennis ball.

(a) Qualitative data

– Divide each measurement into two parts:

Draw a stem plot for the marks of these students.

Draw a stem plot for the data.

192 157 156 155 154 148 151

Key: 4|55 = RM 455 25

Draw a stem plot for the heights of these students using

If the class interval of 2cm is used, the following class

142-143, 144-145, 146-147, 148-149,

142-144, 145-147, 148-150,

142-144 : 142, 144, 144

Key: 142|2 means 142+2= 144cm

The above data is called raw data.

The above data is called an array.

Percentage = relative frequency  100%

The following table gives the frequency

Age No. of Employees

Class Class Midpoint, Frequency, Relative Percentage

▪ A histogram is a graph in which class boundaries

▪ A “less than” ogive is a graph showing the cumulative

• As a smooth curve – by drawing a smooth curve passing

• As a polyline between points – by joining with straight

Completion time (minutes) Number of workers

(b) The number of employees receiving at least RM115 as

• A measure of central location for a data set and

• There are three measures of central location.

1) Mean 2) Median 3) Mode

▪ Mean does not necessary correspond to one of the

Mean for population data:

Mean for sample data:

Mean for population data:

Mean for sample data:

Mean for population data:

Mean for sample data:

Calculate the mean for the sample data in the frequency

Interpretation : The average number of sales is

where n = number of observations

Grouped data (Grouped frequency distribution):

75, 67, 48, 66, 89, 51, 70

Arrange the numbers in ascending order, which is

65, 75, 20, 63, 42, 51, 39, 25

20, 25, 39, 42, 51, 63, 65, 70

Median position = (8+1)/2

can be estimated by using the following steps.

1. Find the median class

1. Determine the total frequency before the median class.

3. Use the method of proportion to calculate the median.

where LM = lower boundary of median class,

fM = frequency of median class,

Ans: 9.72 units

Median class boundaries: 8.5 – 10.5

Lower boundary of median class, LM = 8.5