You are on page 1of 74

Chapter 2

Descriptive Statistics

Part I
Learning Objective

 Organise quantitative and qualitative data in appropriate table


and graphs.

 Run the tabulating and graphing function in excel.


Why table and graph?

Data
We would like to study the
gender distribution

Graph

Table
INFOGRAPHICS

 An infographic is a collection of imagery, data visualisation like pie charts


and bar graphs, and minimal text that gives an easy-to-understand
overview of a topic.
2.1 Some Definitions

 Raw Data
Data in the sequence in which they are collected and before they are processed
or ranked.
 Arrays
An arrangement of numerical raw data in ascending order or descending
order of magnitude.
Ungrouped data
 Contains information on each member of a sample or population
individually.
 Grouped data
 Data presented in classes or intervals
Example 2.1

Time taken by 5 adults to commute to work (in minutes) is as follows:


55, 75, 90, 30, 105 This is a _________________

The time arranged in increasing order is as follows:


30,55,75,90, 105 This is an ______________

The time can be further grouped in interval of 30 minutes as follows:

This is a ________
Time 0-30 31-60 61-90 >=91
Frequency 1 1 2 1
Tabulating and Graphing

Ask yourself the following questions:

What type of variables are you dealing with?


2.2 Organizing and Graphing
Qualitative Data

 Frequency Distribution
 Relative Frequency and Percentage Distributions
 Graphical Presentation
1. Frequency Distribution for
Qualitative Data

A tabular arrangement that lists all categories and


the number of elements that belong to each of the
categories.
Frequency Distribution for Qualitative Data
(cont'd)

Course Frequency
Biotech 8
Business 6
Engineering 4
Infotech 3
Others 4
Total 25

What do you think can be done to this table to make it a little more informative?
2. Relative Frequency and
Percentage Distributions

 Tabular arrangement that lists the relative frequencies and


percentages for all categories .

frequency of the category


Relative frequency of a category 
sum of all frequencies
f

f

Percentage  Relative Frequency  100%


Relative Frequency Distribution for
Qualitative Data (cont'd)

Course Frequency Percentage


Biotech 8 32
Business 6 24
Engineering 4 16
Infotech 3 12
Others 4 16
Total 25 100

What do you think can be done to this table to make it a little more informative?
Graphical Presentation of Qualitative Data

1. Bar Chart (bar graph)


A graph made of bars whose heights represent the frequencies of
respective categories .
Graphical Presentation of
Qualitative Data

 2. Pie Chart
 A circle divided into portions that represent the relative
frequencies or percentages of a population or a sample belonging
to different categories.
 Starts at 12 o’clock
 Moves clock wise from largest sector to the smallest.
Example 2.3
HANDS ON SESSION ONE

Download the course classification table from WBLE.


Use the Pivot table function in MsExcel to create Frequency Table.
Organizing and Graphing Quantitative
Data

1. Tabulation
Single-Valued Classes
Frequency Distribution for quantitative data
Relative frequency and percentage distributions
Cumulative frequency distribution
Organizing and Graphing Quantitative
Data

2. Graphing
Histogram
Polygon
Ogive / Cumulative frequency curve
Stem-and-leaf displays
1. Single-Valued Classes

 Single-valued classes is used if the observations in a data set


assume only a few distinct values (classes that are made of
single values and not of intervals).
 It is useful in cases of discrete data with only a few possible
values.
Single-Valued Classes (cont’d)

› A sample of 40 randomly selected households from a city


produced the following data on the number of vehicles
owned:
› Construct a frequency distribution table for these data.

5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Frequency Distribution for vehicles owned by
households

Number of vehicles Total


owned
0 2
1 18
2 11 You can
create
3 4 percentage
4 3 column!

5 2
Grand Total 40
2. Frequency Distribution for quantitative
data (continuous)

 Class
› An interval that includes all the values that falls within two numbers,
the lower and upper limits.

 Class limits
› Endpoints of each interval.

 Class Boundary
› The dividing line between two classes. It is given by the midpoint of
the upper limit of one class and the lower limit of the next higher class.
Frequency Distribution for
quantitative data (cont'd)

 Class width / class size


 Class width is the difference between the upper
and lower class boundary.

Class width  Upper boundary - Lower boundary

 Class mark / class midpoint


 Class mark is the midpoint of the class interval.

(Lower class limit  Upper class limit)


Class mark 
2
HIV Positive Cases in Malaysia by age groups

Age No of HIV Class Boundaries Class Class


group +ve Width Midpoint
Cases
2-12 532 1.5 to less than 12.5 11 7
13-19 1 140 12.5 to less than 19.5 7 16
20-29 27 995 19.5 to less than 29.5 10 24.5
30-39 34 770 29.5 to less than 39.5 10 34.5
40-49 12 580 39.5 to less than 49.5 10 44.5

Fourth class = 30-39 Lower boundary of 3rd class = (19 +20) = 19.5
2
Lower limit of 3rd class = 20 Upper boundary of 3rdclass = (29 +30) = 29.5
2
Upper limit of 3rd class = 29 19.5-12.5 = 7 (13 + 19) / 2 = 16
Constructing frequency distribution tables

 Determine the number of classes,


usually varies from 5 to 20, depending mainly on the number
of observations in the data set.
 Find 2k where k is the smallest number such that 2k is greater
than the number of observations (n).

Determine the class interval or width ( i ) Must cover at


least the distance from the smallest
Largest value-Smallest
value(H) (L) in thevalue(L)
raw data
Approximate class width
up to the largest value (H). number of classes
Constructing frequency distribution tables (cont'd)

 Determine the lower limit of the first class or the starting point.
 Any convenient number that is equal to or less than the smallest value in the data set
can be used as the lower limit of the first class.
Constructing frequency
distribution tables (cont'd)

 Sample of birth-weights (oz) from 50 consecutive deliveries is


given below. Construct a frequency distribution table.
Constructing frequency
distribution tables (cont'd)

Birth Weight (OZ) Frequency


80-89 4
90-99 7
100-109 8
110-119 7
120-129 13
130-139 8
140-149 3
3. Relative frequency and
Percentage distributions

 Relative Frequency of a class


Frequency of that class f
 
sum of all frequencies f
Percentage  Relative Frequency 100%

 Calculate the relative frequencies and percentages


distributions for the distribution of birth weight.
Relative frequency and
Percentage distributions

 Solution:
Part 1- Tabulation
1.Download the score data from WBLE
2.Create a frequency table for the data.

Part 2- Trouble shooting


1.We can see that some decimal points
HANDS ON fall between two groups.
SESSION TWO 2.Sort this out using the boundary
concept.
3.Is the classification appropriate?
Change if not.
4. Histogram and Polygon

 Grouped (quantitative) data can be displayed in a histogram or


a polygon.
 Histogram
Three types of histogram
 Frequency histogram
 Relative frequency histogram
 Percentage histogram
Histogram

 A frequency histogram consists of a set of rectangle having


› The bases on a horizontal axis with centres at the class marks
and lengths equal to the class interval sizes.
› The areas proportional to the class frequencies.
 If the class intervals all have equal size
› the height of the rectangles are proportional to the class
frequencies.
 otherwise
› the height of the rectangles must be adjusted.
Histogram (cont'd)

 Procedures to draw a histogram:


 Mark the class boundary of each interval on the horizontal axis.
 For each class, mark the frequencies (or relative frequencies or
percentages) on the vertical axis.
 Draw a bar for each class so that its height represents the frequency
of that class. (No gap between each bars)
 Label the histogram.
The birthweight data (Using Excel)
Histogram (Using Excel)

Histogram

14
12
Frequency

10
8
Frequency
6
4
2
0
90 0 0 0 0 0 e
1 0 1 1 1 2 1 3 1 4 or
M
Bin
Polygon

 Polygon is a line graph formed by joining the midpoints of the tops of successive bars
in a histogram.
Next, we mark two more classes (with zero frequencies), one at each end, and mark the
midpoints.
 Three types of polygon:
 Frequency polygon
 Relative frequency polygon
 Percentage polygon
Histogram and Polygon (cont'd)

 Example 2.6
Reconsider the data in Example 2.4 and draw
i) the frequency histogram and frequency polygon
ii) the relative frequency histogram and relative frequency polygon
iii) the percentage histogram and percentage polygon
Histogram and Polygon (cont'd)
 Solution:

14
25
12

20
10
Frequency

Percent
15

6
10
4

5
2

0 0
80 90 100 110 120 130 140 150 80 90 100 110 120 130 140 150
Birth-weights (oz) Birth-weights (oz)
Histogram and Polygon (cont'd)

 Solution:

14 0,3
12

Relative Frequency
0,25
10
0,2
Frequency

8
0,15
6
0,1
4
2 0,05
0 0
74,5 84,5 94,5 105 115 125 135 145 155 74,5 84,5 94,5 105 115 125 135 145 155
Birth-weights (oz) Birth-weights (oz)
Histogram and Polygon (cont'd)

The frequency distribution gives the weight of 35 objects,


measured to the nearest kg. Draw a histogram to illustrate the
data.
Histogram and Polygon (cont'd)

Standard Class Width


Adjusted Frequency   Frequency
Class Width

Weight Class Frequency Height of rectangle


(kg) width (adjusted frequency)
6–8 3 4 4 3
൬൰X 10 = 5
9 – 11 3 6 6
6

12 – 17 6 10 5
18 – 20 3 3 3
21 – 29 9 12 4
Cumulative frequency distribution

 A table that presents the total number of values that fall below
the upper boundary of each class.
 It is constructed for quantitative data only.

Cumulative Relative Frequency


Cumulative Frequency of that class

sum of all frequencies in the data set
Cumulative Percentage  Cumulative Relative Frequency 100%
Cumulative frequency
distribution (cont'd)

Refer to the birthweight data construct its cumulative frequency distribution,


cumulative relative frequency and cumulative percentage.
Cumulative frequency distribution
(cont'd)

Birth–weights Cumulative Cumulative relative Cumulative


(oz) frequency frequency percentage, %
70-79 0 0.0 0
80-89 4 0.08 8
90-99 11 0.22 22
100-109 19 0.38 38
110-119 26 0.52 52
120-129 39 0.78 78
130-139 47 0.94 94
140-149 50 1.00 100
We start with a non-existing class and assign it a frequency of zero.
Ogive / Cumulative frequency curve

 A curve drawn for the cumulative frequency


distribution by joining the dots marked above the
upper boundaries of classes at heights equal to the
cumulative frequencies of respective classes.
› Note:
1. The ogive starts at the lower boundary of the first class and ends at the upper boundary of the last
class.

2. If relative cumulative frequency is used in place of cumulative frequency, the graph is called
relative cumulative frequency curve or percentage ogive.
Cumulative frequency curve /
Ogive (cont'd)

 Example 2.10
 Draw an ogive for the data in Example 2.4. Estimate from the
ogive, determine

a) the total number of deliveries that their birth-weights were less


than 95oz.

b) the value of X ,if 20 % of the deliveries were of birth-weights


X oz or more.
Cumulative frequency curve /
Ogive (cont'd)

Ogive
Cumulative Frequency

60
50
40
30
20
10
0
79,5 89,5 99,5 109,5 119,5 129,5 139,5 149,5
Birth-weights (oz)
Cumulative frequency curve /
Ogive (cont'd)

 Solution:
a) From the graph, there are 8 deliveries with birth-weights less
than 95 oz.
b) If 20% were X or more then 80% were less than X, 80%
50=40
From the graph, X= 130.5
Stem-and-leaf displays

 Each value is divided into two portions - a stem and a leaf.


The leaves for each stem are shown separately in a display.
 Note:
 It is constructed only for quantitative data.
 An advantage over a frequency distribution because we do not lose
information on individual observations.
Stem-and-leaf displays (cont'd)

The following are the scores of 30 college students on a


statistics test.
Construct a stem-and-leaf display.

75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98
Key 5|0 =50
Explore and Learn
1.You can use the score data for
this exercise.
2.Create a Histogram for the data.
HANDS ON 3.Explore excel for various type
graphs that can be used to
SESSION visualize continuous data.
4.Assume that score >30
THREE indicates not satisfied. How can
you show the rate of satisfied vs
unsatisfied customers among
females and males?
The End
Some Excel Instruction…
Frequency Distribution for Qualitative Data (Excel)
Frequency Distribution for
Qualitative Data (Excel)
Removing the space between bars to form a histogram
Example 2.8 (Excel)
Knowledge enables you to make lemonade!

You might also like