Professional Documents
Culture Documents
I Am Sharing 'Topic 1 A ' With You 230824 191909
I Am Sharing 'Topic 1 A ' With You 230824 191909
❑ Representation of Data
▪ Histogram
▪ Cumulative Frequency Curve (Ogive – “less than”)
▪ Stem and leaf display
2
Introduction to Statistics
Statistics may mean any of the following:
1) Numerical facts
Example: Total students enrolled in UTAR over
the years 2000-2003.
2) Measures based on sample data
Example: A sample mean is known as a statistic.
3) Field or discipline of study
- Concerned with scientific techniques used for
collecting, organizing, summarizing, presenting and
analyzing data; drawing valid conclusions and
making decisions based on such analysis.
3
Types of Statistics
Descriptive Inferential
statistics statistics
4
Population Versus Samples
Population Sample
• Consists of all elements • A portion of the population
(individuals, items or objects) selected for study.
whose characteristics are being
• Sample survey : Collecting
studied.
information from a portion of
• Census : Collecting the target population.
information from every member
• Statistical measures obtained
of the target population.
from sample data are called
• Statistical measures obtained sample statistics.
from population data are called
parameters.
5
Text Book
7
Types of Variables
Discrete
data
8
Types of Variables
9
Types of Variables
10
Example 1.1
1. Explain whether each of the following constitutes a
population or a sample.
11
Example 1.2
Based on the following statements, determine whether the
data obtained is discrete or continuous data.
(a) Age of a person.
(b) Result obtained when a fair die is thrown.
(c) Time (in minutes) taken to run 100 meters.
(d) Average monthly expenditure (in RM) on household
goods.
(e) Number of robberies reported per day.
(f) Diameter (in nearest cm) of a tennis ball.
12
Example 1.2 (solution)
(a) Age of a person. Continuous data
13
Example 1.2 (solution)
(d) Average monthly expenditure (in RM) on household
goods. Continuous data
14
Presentation Of Data
Raw data
Data recorded in the sequence in which they are collected
and before they are processed or ranked
15
Organizing and Graphing
-Data organized into tables, and displayed in graphs and charts
Qualitative data Quantitative data
a) Frequency distribution a) Stem and leaf displays
b) Relative Frequency and b) Frequency distribution
percentage distributions
c) Relative Frequency and percentage
distributions
d) Graphing frequency distribution:
- Histogram
- Shape of histogram
e) Cumulative frequency distribution
f) Graphing cumulative frequency distribution
- Ogive or cumulative frequency
curve/polygon.
16
Quantitative
data
17
Stem plots (Stem and
leaf diagram)
18
Stem and Leaf Plots
• A simple graph for quantitative data
• Uses the actual numerical values of each data point.
52 65 53 42 85 57 76 69 44 57 60 67
65 74 72 53 81 68 51 62 87 56 90 70
20
Example 1.3 (solution)
52 65 53 42 85 57 76 69 44 57 60 67
65 74 72 53 81 68 51 62 87 56 90 70
42 44
52 53 57 57 53 51 56
65 69 60 67 65 68 62
76 74 72 70
85 81 87
90
21
Example 1.3 (solution)
Stem and Leaf display:
Stem Leaf Stem Leaf
4 2 4 4 2 4
5 2 3 3 7 1 7 6 5 1 2 3 3 6 7 7
6 5 5 8 9 2 0 7 6 0 2 5 5 7 8 9
7 4 2 6 0 7 0 2 4 6
8 5 1 7 8 1 5 7
9 0 9 0
Key: 4|2 = 42 marks
22
Example 1.4
The monthly electricity bill (in RM) paid by a sample of 24
households selected from a city :
192 235 253 302 455 157 156 549 244 257 350 447
155 154 352 353 410 148 151 502 247 356 246 244
23
Example 1.4 (solution)
192 235 253 302 455 157 156 549 244 257 350 447
155 154 352 353 410 148 151 502 247 356 246 244
24
Example 1.4 (solution)
Stem and Leaf display:
Stem Leaf Stem Leaf
1 92 55 54 57 48 56 51 1 48 51 54 55 56 57 92
2 35 53 44 47 57 46 44 2 35 44 44 46 47 53 57
3 52 02 53 56 50 3 02 50 52 53 56
4 55 10 47 4 10 47 55
5 49 02 5 02 49
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
26
Example 1.5 (solution)
Smallest value = 142
Highest value = 157.
27
Example 1.5 (solution)
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
142-143 : 142
144-145 : 145, 144, 144
146-147 : 147, 147, 146
148-149 : 149, 148
150-151 : 150, 151, 151
152-153 : 152, 152, 153, 152
154-155 : 153, 155, 155, 154
156-157 : 157, 156, 157, 156
28
Example 1.5 (solution)
Stem and Leaf display:
Stem Leaf Stem Leaf
14 2 14 2
14 5 4 4 14 4 4 5
14 7 7 6 14 6 7 7
14 8 9 14 8 9
15 0 1 1 15 0 1 1
15 2 3 2 3 2 15 2 2 2 3 3
15 5 5 4 15 4 5 5
15 7 6 7 6 15 6 6 7 7
Key: 14|2 = 142 cm 29
Example 1.6
For the data in Example 1.5, draw a stem and leaf
diagram using the following class intervals:
30
Example 1.6 (solution)
The interval 148-150 cannot be represented by the ‘stem’
14 because the ‘tens’ digit changes in this interval.
Therefore, ‘stem’ 142, 145, 148, 151, 154 and 157 are
used. The ‘leaf’ is the value that is added to the ‘stem’.
31
Example 1.6 (solution)
152 145 153 142 155 157 156 149 144 157 150 147
155 154 152 153 151 148 151 152 147 156 146 144
32
Example 1.6 (solution)
Stem and Leaf display:
Stem Leaf
142 0 2 2
145 0 1 2 2
148 0 1 2
151 0 0 1 1 1 2 2
154 0 1 1 2 2
157 0 0
34
Ungrouped data
1. Raw Data
Example:
A survey on the number of male children in 20 families:
1 4 2 0 2 3 3 2 1 4
5 2 1 2 0 1 2 3 1 2
35
Ungrouped data
2. Array
• An arrangement of quantitative raw data in
ascending or descending order.
Example:
Number of male children in 20 families in ascending
order:
0 0 1 1 1 1 1 2 2 2
2 2 2 2 3 3 3 4 4 5
37
Number of Number of
male children family
0 10 Height (cm) Number of
1 12 children
2 8 100 - < 105 10
3 6 105 - < 110 12
4 3 110 - < 115 8
5 1 115 - < 120 6
120 - < 125 3 38
Relative Frequency and Percentage
The relative frequencies and percentages for a
quantitative data set are obtained as follows:
Relative frequency of a category
frequency of that category f
= =
sum of all frequencies f
39
Grouped frequency distribution: Class limits
and Class Boundaries
• Class limits: The smallest and largest possible
measurements in each class, that is, the lower and upper
limits of each class.
• Class boundaries: The dividing lines between successive
classes.
• Class Boundary (discrete data): Given by the midpoint
of the upper limit of one class and the lower limit of the
next class.
• Class Boundary (continuous data) : Corresponds to the
upper limit of one class or the lower limit of the next
class. 40
Frequency distribution: Class width
• The difference between the two boundaries of a class
gives the class width. The class width is also called the
class size or class interval.
Class Width = Upper boundary – Lower boundary
• The class width can also be determined by finding the
difference between the lower limit of the next class and
the lower limit of the class.
Class Width =
Lower limit of the next class – Lower limit of the class
41
Frequency distribution:
Class midpoint
• The midpoint of each class is called class midpoint or
class mark. It lies half-way between the class limits or the
class boundaries.
42
Example 1.7 (a) (Discrete Data)
43
Example 1.7 (b) (Continuous Data)
44
Example 1.7 (b) (Continuous Data)
12
18 – 30 18 – 30 12 24 12 = 0.24 24
50
19
30 – 42 30 – 42 12 36 19 = 0.38 38
50
14
42 – 54 42 – 54 12 48 14 = 0.28 28
50
5
54 – 66 54 – 66 12 60 5 = 0.10 10
50
Sum = Sum =
f = 50
1.00 100%
45
Summary of Class Boundaries
Continuous
Class
Continuous
Boundaries Equivalent to
Grouped Data, x
0 − 20 0 – 20
20 − 40 20 – 40
Discrete
Discrete Class
Equivalent to
Grouped Data, x Boundaries
1 − 20 0.5 – 20.5
21 − 40 20.5 – 40.5
Example 1.8
The table below shows the class boundaries, class width and
class midpoints for the grouped frequency distribution of
weekly sales in units.
Sales (units) Class boundaries Class width, Class midpoint,
Class limits c m
110 – 129
130 – 149
150 – 169
170 – 189
190 – 209
47
Example 1.9
The table below shows the class boundaries, class width and
class midpoints for the grouped frequency distribution of
age of employees in years.
Age (years) Class Class Class
Class limits boundaries width, c midpoint, m
25 – < 30
30 – < 35
35 – < 40
40 – <45
45 – < 50
48
Histogram
▪ A diagrammatic presentation of a frequency
distribution.
▪ Histogram for Frequency Distribution
- All bars are of same width.
- Height of every bar is proportional to the frequency
of the corresponding class.
50
Example 1.10 (solution)
Time (minutes), Class boundaries Frequency, f
30–35 2
35–40 17
40–45 18
45–50 13
50–55 6
55–60 1
60–65 3
Note: Since the distribution is of equal class size, a 2-column
table showing class boundaries and frequency is
51
required for drawing the histogram.
Example 1.10 (solution)
Histogram: Times (in minutes) taken by 60
complete a model in a competition
students to
52
30 35 40 45 50 55 60 65
“Less than” Cumulative
Frequency Curve
▪ A graphical presentation of a “less than” cumulative
frequency distribution is called a “less than” ogive or
“less than” cumulative frequency curve/polygon.
53
“Less than” Cumulative
Frequency Curve
An ogive for the cumulative frequency distribution can be
presented in two forms:
‘Less than’
Upper class
Class Boundaries cumulative
boundary
frequency
<39.5 0
39.5 – 49.5 < 49.5 2 (0+2)
49.5 – 59.5 < 59.5 12 (2+10)
59.5 – 69.5 < 69.5 30 (12+18)
69.5 – 79.5 < 79.5 43 (30+13)
79.5 – 89.5 < 89.5 48 (43+5)
89.5 – 99.5 < 99.5 50 (48+2)
56
Example 1.12
Construct ‘less than’ cumulative frequency distribution and
state upper class boundaries for the completion times taken
by all 120 workers to complete a standard task in a factory.
57
Example 1.12 (solution)
Continuous variable :
Class
Boundaries Upper class ‘Less than’
boundary cumulative frequency
< 10 0
10 - 12 < 12 9 (0+9)
12 - 14 < 14 38 (9+29)
14 - 16 <16 80 (38+42)
16 - 18 <18 106 (80+26)
18 - 20 < 20 120 (106+14)
58
Example 1.16Example
(a) 1.13
Construct a “less than” cumulative frequency distribution
with upper class boundaries and draw a cumulative
frequency curve based on the following information .
The weights of 20 students (in nearest kg).
Weight (kg) 60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
Number of 3 4 5 6 2
students
Estimate from the ogive,
(i) the total number of students of weight less than 67 kg.
(ii) the value of x, if 20 % of the students were of weight x kg or more.
(iii) the number of students of weight 64 kg or more.
(iv) the minimum weight of the heavier 10% of the students in a group.
59
Example 1.13 (solution)
The table for “less than” cumulative frequency distribution
and the upper class boundaries.
Class Upper class Cumulative
Boundaries boundary Frequency
<59.5 0
59.5 – 62.5 <62.5 3
62.5 – 65.5 <65.5 7
65.5 – 68.5 <68.5 12
68.5 – 71.5 <71.5 18
71.5 – 74.5 <74.5 20 60
Example 1.13 (solution)
“Less than” ogive for the weight of 20 students (nearest kg)
61
Example 1.13 (solution)
“Less than” ogive for the weight of 20 students (nearest kg)
Cumulative Frequency
20
18
16
14
12
10 (i) 9
8
6
4 (iii)
(iv) 71.5 kg
2
(ii) 70 Weight (kg)
0
59.5 62.5 65.5 68.5 71.5 74.5 77.5
62
Example 1.13 (solution)
From the ogive:
( i ) Number of students with weight less than 67 kg = 9
80
( ii ) Position of x = 20 = 16th
100
x = 70
( iii ) 5 students weigh less than 64 kg, therefore
20 − 5 = 15 students weigh more than or equal to 64 kg.
( iv ) 10% of the students have the heaviest weight.
10
20 = 2 students ( last 2 students )
100
Position on cumulative frequency = 20 − 2 = 18th
2 students have at least 71.5 kg. 63
Example 1.14
The table below shows the bonuses given out to 250 employees of a factory
in one particular year.
Bonus (RM) Number of employees
40 ≤ x < 50 3
50 ≤ x < 60 8
60 ≤ x < 80 27
80 ≤ x < 100 75
100 ≤ x < 120 79
120 ≤ x < 150 44
150 ≤ x < 200 14
i.Construct a “less than” cumulative percentage distribution with upper
class boundaries and draw a cumulative percentage ogive.
ii. Estimate the number of employees receiving at least RM115 as bonus.
Ans: (b) 80 employees
64
Example 1.14 (solution)
65
Example 1.14 (solution)
(i) Bonuses (RM) for 250 employees
66
Example1.14
Example 11 (solution)
Solution
67
68
Measure Of Central Tendency
69
Measure of Central Tendency
70
Mean
Mean is the average of values. It is also known as
arithmetic mean.
71
Mean of Ungrouped data
Where,
72
Example 1.15
Find the mean of the set of the numbers
{ 12,18, 13, 10, 6, 23, 16}.
Solution :
73
Mean of Ungrouped data with frequency
Where,
74
Mean of Grouped data
Where,
75
Example 1.16
x 4 5 7 10 11 15 17
f 3 12 23 10 14 8 2
Ans: 8.9028
76
Example 1.16 (solution)
x f fx
4 3 12
5 12 60
7 23 161
10 10 100
11 14 154
15 8 120
17 2 34
fx = 641 77
Example 1.16 (solution)
Mean,
78
Example 1.17
Calculate the sample mean of the following grouped
frequency distribution and interpret the value.
Sales (units) Frequency, f
1–4 5
5–8 13
9 – 12 31
13 – 16 19
17 – 20 8
21 – 24 4
Ans: 11.7 units 79
Example 1.17 (solution)
Sales Mid-point , f fm
(units) m
1–4 2.5 5 12.5
5–8 6.5 13 84.5
9 – 12 10.5 31 325.5
13 – 16 14.5 19 275.5
17 – 20 18.5 8 148.0
21 – 24 22.5 4 90.0
80
Example 1.17 (solution)
Mean,
82
Ungrouped data:
Median position =
Median position =
83
Example 1.18 (Median for Ungrouped data)
Find the median for the set of data shown below and
interpret the value.
Ans: 67
84
Example 1.18 (solution)
75, 67, 48, 66, 89, 51, 70
Median position =
Median = 67
Interpretation:
About 50% of the data is less than 67 and about 50% of
the data is more than 67.
85
Example 1.19 (Median for Ungrouped data)
Find the median for the following data :
Ans: 46.5 86
Example 1.19 (solution)
Arrange the numbers in ascending order, that is
87
Median for Grouped data (Multi-value
grouping)
The median is the th observation and it
1
2
( f ) − f M −1
Median M = LM + c
fM
89
Example 1.20
Find the median for the data in the following grouped
frequency distribution and interpret the value.
Number of
defectives 1-2 3-4 5-6 7-8 9-10 11-12 13-14 15-16
(units)
Number of
weeks 5 8 22 18 36 29 20 12
(Frequency)
92
Example 1.20 (solution)
Interpretation:
About 50% of the weeks with number of defectives less
than 9.72 units and about 50% of the weeks with number
of defectives more than 9.72 units. 93
Example 1.21
Find the median for the data in the following grouped
frequency distribution.
=123
Frequency of the median class, fM = 63
Width of the median class, c = 25 - 20 = 5
96
Example 1.21 (solution)
97
Example 1.22
Determining Median Using Ogive
100 earthworms were collected from a garden. The length
(to the nearest mm) of the earthworms is recorded as shown
in the table below.
Length (mm) 95 110 125 140 155 170 185 200
- - - - - - - -
109 124 139 154 169 184 199 214
Number of 2 8 17 26 24 16 6 1
earthworms
Median position =
=
=
50th
100
Example 1.22 (solution)
“Less than” cumulative curve: Length of 100 earthworms
Median = 153 mm
101
Mode of Ungrouped data
Mode is the value of observation that occurs most
103
104
fm = frequency of the modal class
fb = frequency before the modal class
fa = frequency after the modal class
Lm = lower boundary of the modal class,
c = width of the modal class.
105
Example 1.23
The marks obtained by 134 students in an examination
are recorded in the following table.
Marks 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 22 18 22 24 14 14 20
(a) Draw a histogram to represent the above
information.
(b) Estimate the mode
(i) from the histogram;
(ii) using the formula.
(c) Interpret the mode.
106
Example 1.23 (solution)
(a)
Marks
Frequency , f
(Class boundaries)
19.5 – 29.5 22
29.5 – 39.5 18
39.5 – 49.5 22
49.5 – 59.5 24
59.5 – 69.5 14
69.5 – 79.5 14
79.5 – 89.5 20
107
Example 1.23 (solution)
Histogram: Marks obtained by 134 students in an examination
109
Example 1.23 (solution)
c) Interpretation
110
Example 1.24
The frequency table below shows the mass of mangoes
(in gm) collected from the farm of Mr Nazri Adam
during the mango season. Class a – b shows the
interval a mass < b
100 125 150 175 200
Mass (gm) - - - - -
125 150 175 200 225
Number
28 75 42 26 10
of mangoes
112
Subtopics
❑ Measures of Dispersion
▪ Range
▪ Variance
▪ Standard Deviation
❑ Measures of Position
▪ Quartiles
❑ Box Plots
2
Measures Of Dispersion
• Types of measurement which provide
information on the spread or variability of a
set of data.
• There are
1) Range
2) Variance
3) Standard Deviation
3
Measures Of Dispersion
EXAMPLE : The following are two data sets on the ages of all
workers in each of two small companies.
Company 1 : 47 38 35 40 36 45 39
Company 2 : 70 33 18 52 27
The mean age of workers in both companies are the same, 40
years. If we are not provided with the ages of individual workers
in these two companies and are only told that the mean age of
workers in both companies are the same, we may deduce that the
workers in these two companies have the similar age distribution.
But, as we can observe, the variation in the workers’ ages for the
two companies are very different. As illustrated in the diagram,
the ages of the workers in the second company have a much
larger variation than the ages of the workers in the first company. 4
Measures Of Dispersion
36 39
Company 1 :
35 38 40 45 47
Company 2 :
18 27 33 52 70
Conclusion:
Two data sets can have the same measures of central tendency,
and yet they can still be very different on the variability of values.
A measure of dispersion is used to describe such difference
quantitatively. 5
Range for Ungrouped data
Largest Smallest
Range =
observation − observation
6
Range
Disadvantage :
1) Range is not a good measure of dispersion
for a data set that contains outliers.
2) Its calculation is based on two values only;
the largest and the smallest. All other
values in a data set are ignored. Thus,
range is not a very satisfactory measure of
dispersion.
7
Standard Deviation
The value of standard deviation tells how closely
the values of a data set are clustered around the
mean.
Population standard deviation =
Sample standard deviation = s
8
Variance
Variance is the square of standard
deviation.
Population variance = 2
Sample variance = s2
Where,
= population mean of data
N = total number of observations (population size)
10
Sample Variance and Standard Deviation
For Ungrouped Data
Where,
x = sample mean of data
n = total number of observations (sample size)
11
Example 1.25
Find the variance and the standard deviation
for the following set of sample data.
{ 4, 5 , 6 , 7 , 8 , 9 ,10}
12
Example 1.25 (using original formula)
Solution
x x−x (x − x ) 2
4 -3 9
5 -2 4
6 -1 1
7 0 0
8 1 1
9 2 4
10 3 9
x = 49 ( x − x ) 2
= 28
13
Example 1.25 Solution
Mean , x= x
=
49
=7
n 7
Sample variance , s 2
=
( x − x) 2
=
28
= 4.667
n −1 6
14
Example 1.25 (using alternative formula)
Solution
2
x x
4 16
5 25
6 36
7 49
8 64
9 81
10 100
x = 49 = 371
x 2 15
Example 1.25 Solution
( x)
2
x 2
−
n
Sample variance , s 2
=
n −1
2
(49)
371 −
= 7
6
= 4.667
Sample standard deviation, s = 4.667 = 2.160
16
Example 1.26
Find the standard deviation for the
population data.
3, 5, 6, 4, 6, 5, 6, 8, 5
17
Example 1.26 Solution
Number of observations,
x x2
N= 9
3 9
Standard deviation,
5 25
x x
2 2
6 36
= − 4 16
N N 6 36
2
5 25
272 48
= − 6 36
9 9 8 64
= 1.333 5 25
x = 48 = 272
x 2 18
Population Variance For Grouped
Data (Frequency Distribution)
19
Sample Variance and Standard
Deviation For Grouped Data
Where,
m = midpoint of the class
x = mean of the data
f = frequency of the class
n= f 20
Example 1.27(a)
The grouped frequency distribution below shows
the number of sales made by all the salesperson
of a company in one particular month. Find the
mean and standard deviation.
Sales (units) Frequency
0–9 5
10 – 19 13
20 – 29 23
30 – 39 31
40 – 49 16
21
Example 1.27(a) Solution
Mid-
Sales Frequency,
point, fm fm 2
(units) f
m
0-9 4.5 5 22.5 101.25
10 - 19 14.5 13 188.5 2733.25
20 - 29 24.5 23 563.5 13,805.75
30 - 39 34.5 31 1069.5 36897.75
40 - 49 44.5 16 712 31684
Mean , = fm
f
2556
=
88
= 29.05 units
Example 1.27(a) Solution
Standard deviation ,
fm fm
2
2
= −
f f
2
85222 2556
= −
88 88
= 11.17 units 24
Example 1.27(b)
The following data give the frequency distribution of the
number of orders received each day during a sample
period of 50 days at the office of a mail-order company.
( fm )
2
(832) 2
fm 2
−
n
14216 −
50 = 7.582
s2 = =
n −1 49
s = s 2 = 7.582 = 2.754
Formula List
27
Data Distribution:
Symmetry and Skewness
• If a distribution is represented by a histogram
or a frequency curve, we can see the general
shape of its distribution and the relationship
between the mean, median and mode.
29
Positively Skewed Distribution
(Skewed to the right)
31
Measures of
Location
32
What is measure of position ?
• A measure of position determines the position of a
single value in relation to other values in a sample
or a population data set.
33
Quartiles
Quartiles divide a set of data (arranged in
ascending or descending order) into 4 equal
parts.
34
Inter-quartile Range and Semi-Inter-
quartile Range (or Quartile deviation)
Interquartile Range (IQR)
= Third Quartile − First Quartile
= Q3 − Q 1
(a) Find the values of the three quartiles. Where does the number of
car thefts of 40,197 fall in relation to these quartiles?
(b) Find the interquartile range. 36
Example 1.28 Solution
38
Example 1.29
The following are the ages of nine employees of
an insurance company:
47 28 39 51 33 37 59 24 33
39
Example 1.29 Solution
28 + 33 47 + 51
Q1 = Q2 = 37 Q3 =
2 2
= 30.5 = 49
Also the median
Thus the values of the three quartiles are
Q1 = 30.5 years Q2 = 37 years Q3 = 49 years
41
Quartiles For Ungrouped frequency
distribution (single-value grouping)
1. Construct cumulative frequency.
(n + 1)
2. Median position = , locate the
median. 2
42
Example 1.30
Number of fishes 0 1 2 3 4 5
Frequency 1 5 8 7 3 1
The above data shows the number of fishes
reared in each of 25 houses along Green Road.
Find the median and semi inter-quartile range for
the data.
43
Example 1.30 Solution
Number of Cumulative
Frequency, f
fishes, x frequency, F
0 1 1
1 5 6
2 8 14
3 7 21
4 3 24
5 1 25
44
Example 1.30 Solution
1
Median position = (25 + 1) = 13th
2
Median = 2 fishes
45
Example 1.30 Solution
First quartile , Q = ( x + x )
1
1 6 7
2
1
= (1 + 2)
2
= 1.5 fishes
46
Example 1.30 Solution
Q3 = ( x19 + x 20 )
1
Third quartile,
2
1
= (3 + 3)
2
= 3 fishes 47
Example 1.30 Solution
48
Quartiles for Grouped Frequency
Distribution
For a grouped frequency distribution with total
frequency, f
1
First quartile, Q1 = f th value
4
3
Third quartile, Q3 = f th value
4
49
Quartiles For Grouped Frequency
Distribution
Determine the class boundaries and compute the
cumulative frequency for each class. Locate the
classes that contain the quartiles by computing
their positions. Determine the quartiles using
formulae or graphically.
1
(a) Median position = 2
f
1
(b) Q1 position = 4
f
3
Q
(c) 3 position =
4
f
50
Formula For First Quartile
1
4 ( f ) − f Q1 −1
Q1 = LQ1 + c
fQ1
LQ1 = lower class boundary of the first quartile class
fQ −1 = cumulative frequency before the first quartile
1
class
fQ = frequency of the first quartile class
1
class
fQ = frequency of the third quartile class
3
53
Example 1.31 Solution
150 – 155 15 15
155 – 160 32 47
160 – 165 68 115
165 – 170 52 167
170 – 175 24 191
175 – 180 12 203
54
Example 1.31 Solution
203
Q1 position = th = 50.75th
4
Q1 class boundaries = 160-165.
Lower boundary of 1st quartile class, LQ = 160
1
55
Example 1.31 Solution
1
4 ( f ) − fQ1 −1
1st quartile, Q1 = LQ + c
1
fQ1
1
4 ( 203) − 47
= 160 + ( 5)
68
= 160.28 cm
56
Example 1.31 Solution
3
Q3 position = ( 203) th = 152.25th
4
Q3 class boundaries = 165 – 170.
57
Example 1.31 Solution
1
4 ( f ) − fQ3 −1
3rd quartile, Q3 = LQ3 + c
fQ3
3
4 (203) − 115
= 165 + (5)
52
= 168.58cm
58
Example 1.31 Solution
Semi-interquartile range
= (Q3 − Q1 )
1
2
1
= (168.58 − 160.28)
2
= 4.15cm
59
Find Quartiles graphically
(use ogive)
Median and Quartiles can be determined
directly from cumulative frequency curve
(ogive).
60
Example 1.32
The table below shows the distribution of the mass of
babies (in kg) for babies born in a hospital from January
to June. Draw an ogive to show the frequency
distribution. From your ogive, find the first quartile and
third quartile of the mass of the babies.
61
Example 1.32 Solution
900
800
750th
700
600
500
400
300
250th
200
100
Q = 2.0kg
1
Q = 3.3kg
3 Mass (kg)
1 2 3 4 5 6
63
Example 1.32 Solution
1
Q1 position = (1000) = 250th
4
3
Q3 position = (1000) = 750th
4
From the ogive:
First quartile, Q1 = 2kg
Third quartile, Q3 = 3.3kg
64
Box-plot
A Box-plot shows the spread of a distribution
and to detect outliers by using the 5-number
summary:
1) smallest value,
2) largest value,
3) first quartile ,
4) third quartile and
5) median.
0 10 20 30 40 50 60
Displayed
vertically
67
Box-plot
68
Constructing a Box-Plot
• Calculate Q1, the median, Q3 and IQR.
• Draw a horizontal line to represent the scale of
measurement.
• Draw a box using Q1, the median, Q3.
Q1 m Q3
69
Constructing a Box-Plot
• Isolate outliers by calculating:
Lower fence: Q1 – 1.5 IQR
Upper fence: Q3+1.5 IQR
• Measurements beyond the upper or lower fence
are outliers and are marked (*).
*
Q1 m Q3
70
Constructing a Box-Plot
• Draw “whiskers” connecting the largest and
smallest measurements that are NOT outliers
to the box.
*
Q1 m Q3
71
Box-plot
Boxplot for 3 types of distribution :
1) Symmetrical distribution
72
Box-plot For Symmetrical
Distribution
* Boundary Boundary *
Q1 Q3
Last value Last value
inside inside
boundary boundary
‘Outlier’ ‘Outlier’
77
Example 1.33
Given the amount of sodium in 8 brands of cheese:
260 290 300 320 330 340 340 520
78
Example 1.33 Solution
*
200
m
Q1 Q3
79
Example 1.34
The following data shows a summary of the marks for
Mathematics and Science for students in a class.
Mathematics 10 90 60 45 70
Science 35 85 60 48 72
0 10 20 30 40 50 60 70 80 90 100
Mathematics
Science
Number of observation , n = 23
Median, Q2 = 670F
83
Example 1.35 Solution
55 64 67 77
50 60 70 80 90
59 Temperature, 0F 79
4th Industry Revolution (4th IR)
✓ The fourth industrial revolution is the
fusion of the real world with the
virtual world. The digital revolution is
marked by technology that takes
advantage of Big Data and Artificial
Intelligence (AI) to nurture automatic
learning systems.