You are on page 1of 72

EDUCATIONAL STATISTICS

Prof. D. D. Agyei
Mr. Williams Osei
Department of Mathematics and ICT Education
University of Cape Coast

2/16/23 1
UNIT 3
DATA REPRESENTATION: ORGANIZING NUMERICAL
DATA

2/16/23 2
3
Sessions in the Unit
1 Organizing numerical data: Ordered Array

2 Organizing numerical data: Stem and Leaf

3 Organizing numerical data: Box and Whisker

4 Organizing numerical data: Frequency distributions


Organizing numerical data: Histogram, frequency Polygon and Ogive
5

2/16/23 Unit 3, Session 1 3


4

Session 1

Organizing Numerical Data:


Ordered Array
2/16/23 Unit 3 4
5
Session Objectives
By the end of the session, you should be able to
a) explain what an ordered array is,
b) construct an ordered array, and
c) state strengths and limitations of ordered arrays.

2/16/23 Unit 3, Session 1 5


6
What is an Ordered Array?
An ordered array is a sequence of data, in rank order, from the
smallest value to the largest value.
An ordered array therefore:
a) shows range (minimum to maximum values).
b) provides some signals about variability within the range.
c) may help identify outliers (unusual observations).

2/16/23
Unit:3 Session: 1
7
What is an Ordered Array?
Example
Ordered Array of Mathematics
Given below are the marks (out of 25) Scores of Class 6A
obtained by 20 students in a Mathematics
test.
18, 16, 12, 10, 5, 5, 4, 19, 20, 10, 12, 12, 15, 4 5 5 8 8 8
15, 15, 8, 8, 8, 8, 16
8 10 10 12 12 12
The raw data when put in ascending or
descending order of magnitude is called an 1
array or arrayed data. 15 15 16 16 18
5
4, 5, 5, 8, 8, 8, 8, 10, 10, 12, 12, 12, 15, 15, 1
15, 16, 16, 18, 19, 20 20        
9
2/16/23 7
8
Constructing an Ordered Array
There are two simple steps to take in the construction of an ordered array.
These steps are described below.
We would use the following example to illustrate the steps.

The Students’ record department of the University of Cape Coast surveyed the ages of
some students on campus during registration. Study the data set and hence use an
ordered array to represent it.

Ages of regular students 18, 17, 21, 16, 18, 18, 20, 19, 20, 19, 17, 22, 32, 27, 22, 38,
42.
Ages of sandwich students 18, 45, 19, 33, 21, 20, 23, 28, 32, 19, 41, 18.
2/16/23
Unit: 3 Session: 1
9

Step 1:
Order or rank the data from lowest to highest.

In this situation, we will have:


Ages of regular students
18, 17, 21, 16, 18, 18, 20, 19, 20, 19, 17, 22, 32, 27, 22, 38, 42.
Ages of sandwich students
18, 45, 19, 33, 21, 20, 23, 28, 32, 19, 41, 18.

2/16/23
Unit: 3 Session: 1
10
Constructing an Ordered Array (Cont’d)
Step 2: Regular Students

Create a table with columns 16 17 17 18 18 18


and rows to represent the
19 19 20 20 21 22
data. Remember to label it
Age of 22 27 32 38 42
appropriately. Surveyed
UCC Sandwich Students
Students
18 18 19 19 20 21

23 28 32 33 41 45

2/16/23
Unit: 3 Session: 1
11
Strengths of Ordered Array

• It provides a way to visualize data.

• It allows you to see things in the data you


might otherwise not see.

• It helps you look at your data in new


ways.

2/16/23
Unit: 3 Session: 1
12
Limitations of Ordered Array

• If the data set is large, the ordered array is less useful.

• The order in which data was originally collected is


lost.

• For a large data set, the ordered array occupies much


space.

2/16/23
Unit: 3 Session: 1
13
Session Review
• In this session, you have learnt about ordered
arrays.
• Basically we said it is about arranging values
in a particular order in columns and rows.

Trial Question
In ordered array plots, the order in which data
was originally collected is maintained.
True or False

2/16/23 Unit 3, Session 1 13


14

Session 2

Organizing Numerical Data: Stem


and Leaf Plot
2/16/23 Unit 3 14
15
Session Objectives
By the end of the session, you should be able to
a) read a stem and leaf plot,
b) construct a stem and leaf plot,
c) explain the strengths and limitations of stem and leaf plot,
d) explain the uses of stem and leaf plot.

2/16/23 Unit 3 15
16 Nature of Stem and Leaf Plot
Using the scores of ten students in a
mathematics quiz, we can illustrate the
A stem-and-leaf display scores in a stem and leaf plot as below.
organizes data into groups
(called stems) so that the
values within each group
(the leaves) branch out to
Stem "1" Leaf "5" means 15
the right on each row. Stem "1" Leaf "6" means 16
Stem "2" Leaf "3" means 23
 

2/16/23
Unit: 3 Session: 2
17
Reading Stem and Leaf Plot
The ages of people at a school’s speech
and prize giving day is represented on the
diagram.

• Start with the key. It will guide you on


how to read the other values.
• The key on this plot shows that the stem
is the tens place and the leaf is the ones
place

2/16/23 17
18
Reading Stem and Leaf Plot
Finding the Maximum and Minimum Values
The oldest person at the speech and prize The youngest person at the event was 01, or
giving day is 66 years old 1 year old

2/16/23
Unit: 3 Session: 2
19
Reading Stem and Leaf Plot

Finding the Mode

With the numbers ordered on the leaf


side of the plot, we can also see that
there are 4 children that are 4 years
old. This represents the mode because
it is the age that appears the most.

2/16/23
Unit: 3 Session: 2
20
Reading Stem and Leaf Plot

Finding the Median

We can also easily get the median by


finding the middle of the leaves. Here we
can see that the median is 28 years old.
So, half of the guests are younger than 28
and half are older than 28.

2/16/23
Unit: 3 Session: 2
21
Constructing a Stem and Leaf Plot
There are three simple steps involved in the construction of a stem and
leaf plot. These three steps are described below.

We would use the following data set to illustrate the steps. 


Here is a set of data showing the science mock examination scores of
students in Mr Abu’s class.
56, 78, 82, 82, 90, 94, 93, 67, 67,
69,74, 77, 92, 88, 81, 83, 84, 77, 72

2/16/23
Unit: 3 Session: 2
22

Step 1
Organize and order (lowest to highest) the data into groups.

In this situation, we will group the tests by decades.


56 67, 67, 69 72, 74, 77, 77, 78 81, 82, 82, 83, 84, 88 90, 92, 93,
94

2/16/23
Unit: 3 Session: 2
23
Constructing a Stem and Leaf Plot (Cont’d)
Step 2
Create the plot with the stems and the leaves identified.
For our example, we will create the plot with the stems as the tens and the
leaves as the ones. The stems will be 5, 6, 7, 8 and 9

2/16/23
Unit: 3 Session: 2
24
Constructing a Stem and Leaf Plot (Cont’d)
Step 3
Add a key to the bottom of the stem and leaf plot.
This is to ensure the right interpretation of the plot.

2/16/23
Unit: 3 Session: 2
25 Strengths of stem and leaf plot

• A stem and leaf plot can be constructed quickly using pencil and paper.
• In a stem and leaf plot, the original and specific data values of a data set can be seen
and identified.
• A stem and leaf plot allows you to clearly see the shape of the distribution of a data
set.
• In a stem and leaf plot, extreme values, data clusters and gaps are easily visible.
• A stem and leaf plot can be used to conveniently determine the range, mode and
median of a data set quickly.

2/16/23
Unit: 3 Session: 2
26 Limitations of stem and leaf plot

• A stem and leaf plot is not very informative for a very small
data set.

• It is tiring in constructing stem and leaf plots for very large data
sets.

• The order in which data is originally collected is lost in a stem


and leaf plot.

2/16/23
Unit: 3 Session: 2
27
Uses of Stem and Leaf Plot

• Stem and leaf plots can be used by teachers to see the distribution of
students’ scores on a test.

• In the district education offices, stem and leaf plots can be used to see
the distribution of enrolment of students in the various schools in the
district

2/16/23
Unit: 3 Session: 2
28
Session Review
In this session, we discussed stem-and-
leaf plots.
Basically, stem-and-leaf plots organizes
data into groups (called stems) so that the
values within each group (the leaves)
branch out to the right on each row.

Stem-and-leaf plot can only display


numerical data.
True or False?
2/16/23 Unit 3, Session 2 28
29

Session 3

Organizing Numerical Data: Box and


Whisker Plot
2/16/23 Unit 3 29
30
Session Objectives
By the end of the session, you should be able to
a) describe the nature of box and whisker plot,
b) construct a box and whisker plot.
c) explain the strengths and limitations of box and
whisker plot
d) state at least two uses of box and whisker plots in
education.

2/16/23 Unit 3, Session 3 30


31
Nature of Box and Whisker Plot
The Box-and-Whisker Plot is a graphical display of the five number summary.
• Minimum

• First Quartile (Q1)

• Median (Q2)

• Third Quartile (Q3)

• Maximum

2/16/23
Unit: 3 Session: 3
Quartile Measures
Locating Quartiles
Find a quartile by determining the value in the appropriate
position in the ranked data, where

First quartile position: Q1 = (n+1)/4 ranked value

Second quartile position: Q2 = (n+1)/2 ranked value

Third quartile position: Q3 = 3(n+1)/4 ranked value

where n is the number of observed values


Quartile Measures
Guidelines
 Rule 1: If the result is a whole number, then the quartile is equal to that
ranked value.

 Rule 2: If the result is a fraction half (2.5, 3.5, etc), then the quartile is equal
to the average of the corresponding ranked values.

 Rule 3: If the result is neither a whole number or a fractional half, you round
the result to the nearest integer and select that ranked value.
Quartile Measures
Locating the First Quartile
 Example: Find the first quartile

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

First, note that n = 9.


Q1 = is in the (9+1)/4 = 2.5 ranked value of the ranked
data, so use the value half way between the 2nd and 3rd
ranked values,
so Q1 = 12.5
Q1 and Q3 are measures of non-central location
Q2 = median, a measure of central tendency
Measures of Variation
Interquartile Range
Example:
Median
X Q1 Q3 X
minimum (Q2) maximum
25% 25% 25% 25%

Interquartile range
= 57 – 30 = 27
39 Strengths of box and whisker plot
• A box and whisker plot can show whether a data set is symmetric or
skewed.
• The shape of distribution of a data set can be seen on a box and whisker
plot.
• Box and whisker plots allow for multiple sets of data to be displayed in a
single graph.
• They allow for comparison of data from different categories.
• A box and whisker plot shows the variability of a data set.

2/16/23
Unit: 3 Session: 3
40 Limitations of box and whisker plot

• A box and whisker plot does not show frequency.

• A box and whisker plot does not display the individual statistics.

2/16/23
Unit: 3 Session: 3
41
Uses of Box and Whisker Plots in Education
• A teacher can use box and whisker plots to compare the performance of
students in a particular subject from different classes.
• The district education office can compare the performances of students
of particular schools over the years.
• A teacher can use box and whisker plots to analyse the effect of a
methodology on students by comparing their scores before and after the
intervention.

2/16/23
Unit: 3 Session: 3
42
Session Review
• The Box-and-Whisker Plot is mainly a
graphical display of the five number
summary. Remember them?
Write the five number summary in full
1) Min- …………………
2) Q1- …………………
3) Med- …………………
4) Q3- …………………
5) Max- …………………

2/16/23 Unit 3, Session 3 42


43

Session 4

Organizing Numerical Data:


Frequency Distributions
2/16/23 Unit 3 43
44
Session Objectives
By the end of the session, you should be able to

a) describe the nature of frequency distributions,

b) describe the features of frequency distribution tables,

c) construct frequency distribution tables.

2/16/23 Unit 3, Session 4 44


45
Nature of Frequency Distributions
A frequency distribution is any arrangement of data that shows the frequency of
occurrence of different values of the variable or the frequency of occurrence of values
falling within arbitrarily defined ranges of the variable. The frequency distribution
could either be ungrouped or grouped.
Example of a frequency distribution table
The following marks were obtained by a group of 40 students in a Statistics examination.
76 88 93 75 70 93 73 62 69 75
71 80 52 76 66 54 73 80 79 89
83 62 53 79 69 56 81 75 71 72
52 65 49 80 67 59 88 87 91 82

2/16/23
Unit: 3 Session: 4
46
Ungrouped Frequency Distributions
Score Frequency Score Frequency Score Frequency
93 2 79 2 66 1
91 1 76 2 65 1
89 1 75 3 62 2
88 2 75 3 59 1
87 1 73 2 56 1
83 1 72 1 54 1
82 1 71 2 53 1
81 1 70 1 52 2
80 3 69 2 49 1
79 2 67 1 Total 40
2/16/23
Unit: 3 Session: 4
47
Grouped Frequency Distributions
Table 2.7

2/16/23
Unit: 3 Session: 4
48
Features of a Grouped Frequency Distribution
• Class. This is the group of scores as shown in column 1 of Table 2.7.
• Class interval. The range within which a group of scores lie. It has a
number at the beginning and at the end. In Table 2.7 the first class from
the top has the interval, 91-95.
When all the class intervals have the same range (i.e. difference
between the two values), the distribution is referred to as equal class
interval distribution but where there are differences in the range of the
intervals, the distribution is referred to as unequal class interval
distribution.

2/16/23
Unit: 3 Session: 4
49 Features of a Grouped Frequency Distribution
(Cont’d)
• Open-ended classes. These are classes with a value at one end, either at
the beginning or the end and a description at the other end. These
intervals are put either at the top or bottom of the table. Using the forty
scores above a top class can be “90 and above” or “Above 90” and the
bottom one can be “45 and below” or “Below 46”.
• Class limits. These are the end points of a class interval. The smaller
number is the lower limit and the bigger number is the upper limit. In
Table 2.7, using the bottom class of 46-50, the lower limit is 46 and the
upper limit is 50.

2/16/23
Unit: 3 Session: 4
50 Features of a Grouped Frequency Distribution
(Cont’d)
• Class mark: The midpoint for each class interval. They are obtained by
adding the two class limits and dividing the result by 2. To get the class
mark for the class, 86-90, 86 is added to 90 to obtain 186. 186 ÷ 2 gives
93.
• Class boundaries. These are the exact or real limits of a class interval.
The lower class boundaries are obtained by subtracting 0.5 from the
lower class limit. The upper class boundaries are obtained by adding 0.5
to the upper class limits. A class interval with limits of 91 – 95 produces
class boundaries of 90.5 - 95.5.

2/16/23
Unit: 3 Session: 4
51

• Class size/class width. These are the number of distinct or discrete scores within a
class interval. They are obtained by finding the difference between successive lower
class limits or upper class limits in cases of equal class intervals. Simply, just count
the number of scores within an interval. For example, 46-50 will give us, 46, 47,
48, 49, 50, giving us 5 numbers. The class size is then 5.

• Frequency: This is the number of distinct scores from the given data that can be
found in a class interval. They are obtained through tallying (i.e. using strokes, ///
to represent the scores). To make counting easier, the strokes are often bound into
bundles of 5.

2/16/23
Unit: 3 Session: 4
52 Features of a Grouped Frequency Distribution
(Cont’d)
• Cumulative frequency. This is the successive sum of the frequencies
starting from the frequency of the bottom class. The frequency for
each class is added to the cumulative frequency below it and then
recorded for the particular class.
• Cumulative percentage frequency. These are obtained by expressing
each cumulative frequency as a percentage. The cumulative
frequency of the class is divided by the total frequency and the result
multiplied with 100. For class 86-90, the cumulative percentage
frequency is :

2/16/23
Unit: 3 Session: 4
53 Features of a Grouped Frequency Distribution
(Cont’d)
• Relative frequency. This is obtained by dividing each frequency by
the total frequency. The total relative frequency must always add up
to 1.0.

• Cumulative relative frequency. This is the successive sum of the


relative frequencies starting from the relative frequency of the
bottom class. The relative frequency for each class is added to the
cumulative relative frequency of the class below it and then recorded
for the particular class.

2/16/23
Unit: 3 Session: 4
54 Constructing a Grouped Frequency
Distribution Table
Example:
A manufacturer of insulation randomly selects 20 winter days and records the
daily high temperature. Construct a grouped frequency distribution for the
data.
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53,
27.

 First, sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53,
58
2/16/23
Unit: 3 Session: 4
55 Constructing a Grouped Frequency
Distribution Table (Cont’d)
 Find range: 58 - 12 = 46
 Select number of classes: 5 (usually between 5 and 15)
 Compute class interval (width): 10 (46/5 then round up)
 Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
 Compute class midpoints: 15, 25, 35, 45, 55
 Count observations & assign to classes

2/16/23
Unit: 3 Session: 4
56 Constructing a Grouped Frequency
Distribution Table (Cont’d)
• So the grouped frequency distribution table would look like this.

Relative
Class Frequency Frequency Percentage

10 but less than 20 3 .15 15


20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100

2/16/23
Unit: 3 Session: 4
Points to note in constructing a frequency
57
distribution table
In constructing frequency distribution tables, there are a number of important points to
note. These points are described below.

1. In Education, the highest class intervals or classes are at the top so this convention
must be followed in constructing the frequency distribution table.
2. Use mutually exclusive classes. Make sure that an observation falls into one and
only class. Classes must not overlap at the class limits. For example, 70 – 80 and
80 – 90 contain overlapping class limits of 80.
3. There should be no class with a zero frequency. If this occurs, it is recommended
that the class size is changed. Preferably increase the class size.

2/16/23
Unit: 3 Session: 4
58 Points to note in constructing a frequency distribution table

4. Open-ended classes should be avoided. These classes have only the


lower limit if it is the class at the top, or the upper limit if it is the class
at the bottom. For example, 51 and above, 20 and below.
5. Aim at classes with equal sizes or width. This facilitates easy
interpretation of the information from the frequency distribution table.
6. The number of classes should not be too small (i.e. not less than 5) and
not too large (i.e. not more than 20). Where the number of classes is
less than 5, class size should be reduced but when the number of
classes is more than 20, the class size should be increased.

2/16/23
Unit: 3 Session: 4
59
Session Review
You should now be able to
a) describe the nature of frequency
distributions,
b) describe the features of frequency
distribution tables,
c) construct frequency distribution tables.

2/16/23 Unit 3, Session 4 59


60

Session 5

Organizing Numerical Data: Histogram,


Frequency Polygon and Ogive
2/16/23 Unit 3 60
61
Session Objectives
By the end of the session, you should be able to
a) construct a histogram,
b) explain the importance of histograms in educational practice,
c) construct a frequency polygon,
d) explain the importance of frequency polygons in educational
practice,
e) construct ogives,
f) explain the importance of ogives in educational practice.

2/16/23 Unit 3, Session 5 61


62
Histogram
 A graph of the data in a frequency distribution is called a histogram.
 The class boundaries (or class midpoints) are shown on the horizontal
axis.
 The vertical axis is either frequency, relative frequency, or percentage.
 Bars of the appropriate heights are used to represent the number of
observations within each class.

2/16/23
Unit: 3 Session: 5
63
Example - Histogram
Class Frequency Relative Percentage
Frequency

10 but less than 20 3 .15 15

20 but less than 30 6 .30 30

30 but less than 40 5 .25 25

40 but less than 50 4 .20 20

50 but less than 60 2 .10 10

Total 20 1.00 100

2/16/23
Unit: 3 Session: 5
64
Constructing a Histogram
The manual construction of a histogram involves four steps. Using a graph sheet,
pencil and rule, follow the steps, to draw a histogram.
Step 1
Draw two axes, a vertical and horizontal. Label the vertical axis by frequency
and the horizontal axis by scores or classes.
Step 2
Select an appropriate scale on the vertical axis considering the highest or largest
value as well as the lowest or smallest value. When using a graph sheet, the
scale should be such that the bars are neither too tall nor too short.

2/16/23
Unit: 3 Session: 5
65 Constructing a Histogram

Step 3
Use class midpoints/marks or class boundaries or class limits to label the points
on the horizontal axis. It is always recommended that the label begins with the
point 0. There are however situations where the lowest score is far from 0. In
such circumstances, part of the horizontal axis is shrunk or moved towards the
vertical axis to reduce extra unused space at the beginning of the graph.
Step 4
Draw bars of equal width representing the classes from a frequency distribution
table with corresponding heights as the frequencies. There should be no spaces
between the bars.

2/16/23
Unit: 3 Session: 5
66 Importance of Histogram in Educational
Practice
It gives a pictorial description of the raw data, providing
information about the nature of the data.
For example, by observing raw scores in a table, it is difficult
to get information about the level of performance of a class.
However, if the data is presented in a histogram as shown on
slide 59, a better picture of the level of performance can be
seen.

2/16/23
Unit: 3 Session: 5
67 Importance of Histogram in Educational
Practice (Cont’d)
• It gives the direction of performance in terms of academic performance
(i.e. skewness).

This histogram
This histogram
is skewed
is skewed
to the left
to the right
implying that
implying that
group
group
performance
performance
tends to be
tends to be low.
high.

2/16/23
Unit: 3 Session: 5
68 Importance of Histogram in Educational
Practice (Cont’d)
It provides an estimate of the most typical score. This is the intersection of
the two diagonals of the tallest bar. It can be estimated as 32.

2/16/23
Unit: 3 Session: 5
69
Frequency Polygons

• A frequency polygon uses data from ratio or interval scales and depends
on frequency distributions. It uses the classes and the frequencies from
the frequency distribution table.

• A percentage polygon is formed by having the midpoint of each class


represent the data in that class and then connecting the sequence of
midpoints at their respective class percentages.

2/16/23
Unit: 3 Session: 5
70
Example – Frequency Polygon
Class Frequency Relative Percentage
Frequency

10 but less 3 .15 15


than 20

20 but less 6 .30 30


than 30
30 but less 5 .25 25
than 40
40 but less 4 .20 20
than 50
50 but less 2 .10 10
than 60
Total 20 1.00 100

2/16/23
Unit: 3 Session: 5
71
Constructing a Frequency Polygon
The construction of a frequency polygon involves five steps. These steps are
described below. Where software such as Microsoft Excel or SPSS is available, they
should be used, otherwise follow the following steps, using a graph sheet to draw the
frequency polygon.
Step 1
Draw two axes, a vertical and a horizontal one. Label the vertical axis by frequency
and the horizontal axis scores or classes.
Step 2
Select an appropriate scale on the vertical axis considering the highest or largest
value and the lowest or smallest value. When using a graph sheet, the scale should
be such that the polygon is neither too pointed nor too short.

2/16/23
Unit: 3 Session: 5
72
Constructing a Frequency Polygon (Cont’d)
Step 3
Use class midpoints or class marks or class boundaries or class limits to label the
points on the horizontal axis.
Step 4
Plot at the midpoint of each class the relevant heights as the frequencies. Join the
midpoints with a straight line.
Step 5
Create two classes, one to the left and one to the right and plot on the horizontal axes
the midpoints of the classes. Extend the line in Step 4 to join the horizontal axes to
complete the polygon.

2/16/23
Unit: 3 Session: 5
73 Importance of Frequency Polygons for
Educational Practice
• It gives a pictorial description of the raw data, providing information about the
nature of the data. Raw data alone is difficult to study. If the raw data is
transformed into a polygon, a visual impression is created and that makes
information about the data easier to grasp.
• It gives the direction of performance (i.e. skewness). Consider three classes, A, B,
C. A B C

Positive skewness Normal Negative skewness


Skewed to the right Skewed to the left
Tends to score low marks Tends to score high marks

2/16/23
Unit: 3 Session: 5
74 Importance of Frequency Polygons for
Educational Practice (Cont’d)
• It provides an estimate of the most typical
score. This is the point on the horizontal axis
where the highest point of the polygon is
located. The most typical scores is used as a
summary to represent the total group
performance.

• It is used to compare the performance of


groups. For example the performance in a
class test for Forms 1A and 1B can be shown
as follows.
2/16/23
Unit: 3 Session: 5
82

THANK
YOU!
2/16/23 82

You might also like