You are on page 1of 19

Chapter 2:

Summarizing Data

2.1 Introduction

 Raw data - Data recorded in the sequence in which there are collected and
before they are processed or ranked.

 Array data - Raw data that is arranged in ascending or descending order.

Example 1:

Here is a list of question asked in a large statistics class and the “raw data” given by
one of the students:

a) What is your sex (m=male, f=female)?


Answer (raw data): m

b) How many hours did you sleep last night?


Answer: 5 hours

c) What is your height in inches?


Answer: 67 inches

d) What’s the fastest you’ve ever driven a car (mph)?


Answer: 110 mph

Example 2:

Quantitative raw data

1
Qualitative raw data

 These data also called ungrouped data

2.2 Organizing and Graphing Qualitative Data

2.2.1 Frequency Distributions/ Table


2.2.2 Relative Frequency and Percentage Distribution
2.2.3 Graphical Presentation of Qualitative Data

2.2.1 Frequency Distributions / Table

 A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.
 It exhibits the frequencies are distributed over various categories
 Also called as a frequency distribution table or simply a frequency table.
 The number of students who belong to a certain category is called the
frequency of that category.

2
2.2.2 Relative Frequency and Percentage Distribution

 A relative frequency distribution is a listing of all categories along with their


relative frequencies (given as proportions or percentages).
 It is commonplace to give the frequency and relative frequency distribution
together.
 Calculating relative frequency and percentage of a category

Relative Frequency of a category = Frequency of that category


Sum of all frequencies

Percentage = (Relative Frequency)* 100

Example 3:

A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja,
St = Satria, P = Perdana, Sv = Savvy):

W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv

Construct a frequency distribution table for these data with their relative frequency
and percentage.

3
Solution:

Relative
Category Frequency Percentage (%)
Frequency
0.38*100
Wira 19 19/50 = 0.38
= 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100

2.2.3 Graphical Presentation of Qualitative Data

1. Bar Graphs

 A graph made of bars whose heights represent the frequencies of respective


categories.
 Such a graph is most helpful when you have many categories to represent.
 Notice that a gap is inserted between each of the bars.
 It has simple/ vertical bar chart, horizontal bar chart, component bar chart and
multiple bar chart.

Simple/ Vertical Bar Chart

 To construct a vertical bar chart, mark the various categories on the


horizontal axis and mark the frequencies on the vertical axis

4
Figure 2.1
Horizontal Bar Chart

 To construct a horizontal bar chart, mark the various categories on the


vertical axis and mark the frequencies on the horizontal axis.

Example 4: Refer Example 3,

UUM Staff-owned Vehicles Produced By


Proton
Types of Vehicle

Satria

Perdana

Wira

0 5 10 15 20
Frequency

Figure 2.3

 Another example of horizontal bar chart: Figure 2.4

Figure 2.4: Number of students at Diversity College who are


immigrants, by last country of permanent residence

5
Component Bar Chart

 To construct a component bar chart, all categories is in one bar and every bar
is divided into components.
 The height of components should be tally with representative frequencies.

Example 5:

Suppose we want to illustrate the information below, representing the number of


people participating in the activities offered by an outdoor pursuits centre during
Jun of three consecutive years.

2004 2005 2006


Climbing 21 34 36
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Total 142 167 191

Solution:

Figure 2.5

6
Multiple Bar Chart

 To construct a multiple bar chart, each bars that representative any categories
are gathered in groups.
 The height of the bar represented the frequencies of categories.
 Useful for making comparisons (two or more values).

Example 6: Refer example 5,

Activities Breakdown (Jun)

120
Number of participants

100
80 Climbing
Caving
60
Walking
40
Sailing
20
0
2004 2005 2006
Year

Figure 2.6

 Another example of horizontal bar chart: Figure 2.7

7
Figure 2.7: Preferred snack choices of students at UUM

2. Pie Chart

 A circle divided into portions that represent the relative frequencies or


percentages of a population or a sample belonging to different categories.
 An alternative to the bar chart and useful for summarizing a single categorical
variable if there are not too many categories.
 The chart makes it easy to compare relative sizes of each class/category.
 The whole pie represents the total sample or population. The pie is divided
into different portions that represent the different categories.
 To construct a pie chart, we multiply 360 o by the relative frequency for each
category to obtain the degree measure or size of the angle for the
corresponding categories.

Example 7 (Table 2.6 and Figure 2.8):

Table 2.6 Figure 2.8

Example 8 (Table 2.7 and Figure 2.9):

Movie Frequency Relative Frequency Angle Size


Genres
Comedy 54 0.27 360*0.27=97.2o
Action 36 0.18 360*0.18=64.8o
Romance 28 0.14 360*0.14=50.4o
Drama 28 0.14 360*0.14=50.4o

8
Horror 22 0.11 360*0.11=39.6o
Foreign 16 0.08 360*0.08=28.8o
Science Fiction 16 0.08 360*0.08=28.8o
200 1.00 360o

Figure 2.9

2.3 Organizing and Graphing Quantitative Data

2.3.1 Stem and Leaf Display


2.3.2 Frequency Distribution
2.3.3 Relative Frequency and Percentage Distributions.
2.3.4 Graphing Grouped Data
2.3.5 Shapes of Histogram
2.3.6 Cumulative Frequency Distributions.

2.1 Stem-and-Leaf Display

 In stem and leaf display of quantitative data, each value is divided into two
portions – a stem and a leaf. Then the leaves for each stem are shown
separately in a display.
 Gives the information of data pattern.
 Can detect which value frequently repeated.

Example 10:

25 12 9 10 5 12 23 7

9
36 3 11 12 31 28 37 6
14 41 38 44 13 22 18 19

Solution:

0 3 5 6 7 9
1 0 1 2 2 2 3 4 8 9
2 2 3 5 8
3 1 6 7 8
4 1 4

2.2 Frequency Distributions

 A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.
 Data presented in form of frequency distribution are called grouped data.

 The class boundary is given by the midpoint of the upper limit of one class
and the lower limit of the next class. Also called real class limit.
 To find the midpoint of the upper limit of the first class and the lower limit of
the second class, we divide the sum of these two limits by 2.

e.g.:
400 + 401
= 400.5
2

10
 Class Width (class size)

Class width = Upper boundary – Lower boundary

e.g. : Width of the first class = 600.5 – 400.5 = 200

 Class Midpoint or Mark

Lower limit + Upper limit


class midpoint or mark =
2

401 + 600
e.g: Midpoint of the 1st class = = 500.5
2

Constructing Frequency Distribution Tables


Figure 2.9

1. To decide the number of classes, we used Sturge’s formula, which is

c = 1 + 3.3 log n

where c is the no. of classes

11
n is the no. of observations in the data set.

2. Class width,
Largest value - Smallest value
i >
Number of classes
Range
i >
c

This class width is rounded to a convenient number.

3. Lower Limit of the First Class or the Starting Point

 Use the smallest value in the data set.

Example 11:

The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season

Solution:

12
i) Number of classes, c = 1 + 3.3 log 30
= 1 + 3.3(1.48)
= 5.89 �6 class

ii) Class width,


242 - 135
i >
6
> 17.8
�18

iii) Starting Point = 135

Table 2.10 Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f


135 – 152 |||| |||| 10
153 – 170 || 2
171 – 188 |||| 5
189 – 206 |||| | 6
207 – 224 ||| 3
225 – 242 |||| 4
�f = 30

2.3 Relative Frequency and Percentage Distributions

Frequency of that class


Relative frequency of a class =
Sum of all frequencies
f
=
�f
Percentage = (Relative frequency) �100

Example 12 (Refer example 11)

Table 2.11: Relative Frequency and Percentage Distributions

Total Home Runs Class Boundaries Relative %


Frequency

135 – 152 134.5 less than 152.5 0.3333 33.33


153 – 170 152.5 less than 170.5 0.0667 6.67
171 – 188 170.5 less than 188.5 0.1667 16.67

13
189 – 206 188.5 less than 206.5 0.2 20
207 – 224 206.5 less than 224.5 0.1 10
225 – 242 224.5 less than 242.5 0.1333 13.33
Sum 1.0 100%

2.4 Graphing Grouped Data

1. Histograms

 A histogram is a graph in which the class boundaries are marked on the


horizontal axis and either the frequencies, relative frequencies, or
percentages are marked on the vertical axis. The frequencies, relative
frequencies or percentages are represented by the heights of the bars.
 In histogram, the bars are drawn adjacent to each other and there is a space
between y axis and the first bar.

Example 13 (Refer example 11)

134.5 152.5 170.5 188.5 206.5 224.5 242.5

Figure 2.10: Frequency histogram for Table 2.10

14
2. Polygon

 A graph formed by joining the midpoints of the tops of successive bars in a


histogram with straight lines is called a polygon.

Example 13

12

10

8
Frequency

0
134.5 152.5 170.5 188.5 206.5 224.5 242.5
1
Total home runs

Figure 2.11: Frequency polygon for Table 2.10

 For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.

Figure 2.12: Frequency distribution curve

15
2.3.5 Shape of Histogram

 Same as polygon.
 For a very large data set, as the number of classes is increased (and the
width of classes is decreased), the frequency polygon eventually becomes
a smooth curve called a frequency distribution curve or simply a
frequency curve.
 The most common of shapes are:
(i) Symmetric

Figure 2.13 & 2.14: Symmetric histograms

(ii) Right skewed and (iii) Left skewed

Figure 2.15 & 2.16: Right skewed and Left skewed

16
2.3.6 Cumulative Frequency Distributions

 A cumulative frequency distribution gives the total number of values that


fall below the upper boundary of each class.

Example 14: Using the frequency distribution of table 2.11,

Total Home Runs Class Boundaries Cumulative Frequency

135 – 152 134.5 less than 152.5 10


153 – 170 152.5 less than 170.5 10+2=12
171 – 188 170.5 less than 188.5 10+2+5=17
189 – 206 188.5 less than 206.5 10+2+5+6=23
207 – 224 206.5 less than 224.5 10+2+5+6+3=26
225 – 242 224.5 less than 242.5 10+2+5+6+3+4=30

Ogive

 An ogive is a curve drawn for the cumulative frequency distribution by joining


with straight lines the dots marked above the upper boundaries of classes at
heights equal to the cumulative frequencies of respective classes.
 Two type of ogive:
(i) ogive less than
(ii) ogive greater than
 First, build a table of cumulative frequency.

Example 15: (Ogive Less Than)

Earnings Number of Earnings (RM) Cumulative


(RM) students (f) Frequency (F)

30 – 39 5
40 – 49 6 Less than 29.5 0
50 – 59 6 Less than 39.5 5
60 - 69 3 Less than 49.5 11
70 – 79 3 Less than 59.5 17
80 - 89 7 Less than 69.5 20
Less than 79.5 23

Total 30
17
Less than 89.5 30

35
30
Cumulative Frequency

25
20
15
10
5
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5

Earnings

Figure 2.17

Example 16 : (Ogive Greater Than)

Earnings Number of Earnings (RM) Cumulative


(RM) students (f) Frequency (F)

More than 29.5


30 – 39 5 30
More than 39.5
40 – 49 6 25
More than 49.5
50 – 59 6 19
More than 59.5
60 - 69 3 13
More than 69.5
70 – 79 3 10
More than 79.5
80 - 89 7 7
More than 89.5
0

Total 30

35
30
25
20
15
10
5
Cumulative Frequency
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5
Earnings 18
Figure 2.18

19

You might also like