You are on page 1of 43

STUDY UNIT 3

FREQUENCY DISTRIBUTION AND


GRAPHICAL REPRESENTATION OF DATA
Study Outcomes

• Tabulate discrete and continuous data

• Graphically represent discrete and continuous data


Statistical process

We would like to How?


By gathering data
learn from this world

But the raw data


We are here! doesn't make any
sense…

We make
So then what?
conclusions about So we summarise the
this data
world(population)
How do we summarise data?

Class
Class midpoint
tables [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
10
0
4
7
1
1
2
5
3
4
10
5
9
6
7
2
9
8
2
5
& [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
6 7 5 10 7 7 5 6 1 7 [x;x) xxx xxx xxx
7
9
7
6
7
1
9
7
10
3
7
2
6
7
0
1
6
2
2
7 graphs XXX XXX
5 10 8 10 1 1 6 2 6 10
3 10 2 7 2 5 9 6 5 7
5 1 7 1 7 1 6 6 3 6
1 6 10 9 5 7 1 1 2 6
5 5 6 6 2 8 5 9 10 6
0 3 9 0 6 9 5 8 3 3
5 4 1 9 7 2 2 10 2 0
1 10 9 3 9 8 3 7 5 5
8 5 1 2 2 9 2 6 6 1
2 1 9 9 3 5 7 1 7 2
Continuous data vs. Discrete data

Graphical methods used in both sets differ slightly

Continuous Discrete
65,1 59,7 23 63,6 64 4 1 1 1 3
51 58 47,1 49,8 28 2 3 0 3 3
71,8 44,6 82,6 31 57,5 0 3 1 4 3
83 68,3 42,7 86,9 74,7 4 4 2 4 1
35,2 31 50 76,4 48 0 2 2 1 3
26,2 46,6 72,1 34,8 81,9 3 0 2 3 4
47 23,9 44,3 41,9 71 2 3 4 3 0
52 57,1 80,6 61 79,1 4 4 4 4 4
Frequency distribution of continuous data

Array: a data set that has been sorted from smallest to largest.

20 12 19 28 21 30 25 12 9 35

9 12 12 19 20 21 25 28 30 35
Frequency distribution of continuous data

Fraquency table: table with classes(intervals) of values and corresponding frequencies

Class: represents all values within its boundaries

[10;20)
--------------
[ x; y ) All observations

[ x  including x greater than and


including 10
y )  excluding y
and

less than and


excluding 20
Frequency distribution of continuous data

Frequency (: number of observations within a particular class

Class 15 observations between 10 and 20


(including 10, excluding 20)
[10;20) 15
12 observations between 20 and 30
[20;30) 12 (including 20, excluding 30)
Constructing a Frequency Table for Continuous data

• Step 1a – arrange data into an array


• Step 1b – calculate the range ()
• Step 2 – determine the number of classes
• Step 3 – calculate the class width ()
• Step 4 – calculate the class boundaries
• Step 5 – tabulate the observations
Step 1a – order the data into an array

We will use the following data to construct our frequency table:

Statistics marks (%) of 30 students

41 32 65 48 50 53 76 81 36 46

22 66 69 63 54 78 87 79 76 71
Exercise:
62 57 53 28 89 65 75 79 25 70 See how many
observations you
can sort in
1 minute.
22 25 28 32 36 41 46 48 50 53

53 54 57 62 63 65 65 66 69 70

71 75 76 76 78 79 79 81 87 89
Step 1b – calculate the range

Range (: the difference between the largest and the smallest observations.

22 25 28 32 36 41 46 48 50 53

53 54 57 62 63 65 65 66 69 70

71 75 76 76 78 79 79 81 87 89

𝑅=max − 𝑚𝑖𝑛
¿ 89 − 22
¿ 67
Step 2 – determine the number of classes

Sturge's rule: the number of classes in a frequency table, can be determined by the formula: , where
is the number of observations in the data set and is rounded off to the nearest whole number.

𝑛=30 𝑘=1+ 1.4 ln ( 𝑛 )

¿ 5.762
classes

Use Sturge's rule only as a guideline

Too few classes – info could be lost


Too many classes – data becomes sparsely distributed
Step 3 – calculate the class width

Class width (): the range of each class. Equal to range (step 1) divided by number of
classes (step 2).

𝑅
𝑤=
𝑘
67
𝑤 =¿ 11,167
6

Class width may be difficult to use  Use a nice round number such as 10

This causes the number of classes to increase from 6 to 7


Step 4 – calculate class boundaries

• Class boundaries specify which observations they contain

• Start with lower boundary


• Must be less than or equal to the lowest observation

• The upper boundary = lower boundary + class width

• The lower boundary of the next class is the same as the upper boundary
of the current class.
Step 4 – calculate class boundaries(cont.)

• Lets start at 20 as the first lower boundary

• Use class range of 10

• Upper boundary = 20 + 10 = 30

First class [20; 30) Second class


[30; 40)

Last class [80; 90)
Note: Make sure that the highest observation is less than or equal to the upper boundary of the last class
Step 5 – tabulate the observations

Original raw data on page 35

Class Class Interval Class mid-point

1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)

Class midpoint: the value that lies in the middle of each class interval

Class midpoint for class 1:

Class Class Interval Class mid-point

1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)

Frequency (): number of observations within a particular class

Statistics mark (%)


Number of students

Class Class Interval Class mid-point

1 [20;30) 25 3
How many students
2 [30;40) 35 2
scored between 20%
3 [40;50) 45 3 and 30%?
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3 Total always equal to
Total 30 (data set size)
Ungrouped data vs. Grouped data

Ungrouped (raw) data


22 25 28 32 36 41 46 48 50 53

53 54 57 62 63 65 65 66 69 70

71 75 76 76 78 79 79 81 87 89

Grouped data

Class Class Interval Class mid-point

1 [20;30) 25 3
2 [30;40) 35 2
3 [40;50) 45 3
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3
Total 30
Cumulative frequency distribution

Cumulative frequency (): The cumulative frequency of a point with value x is defined as the total
number of observations in the data set with values less than or equal to x.

Class Cumulative
Boundary Frequency ()
< 20 0
< 30 3 How many students
< 40 5
scored less than 70?
< 50 8
< 60 13 How many students
< 70 19 scored 70 or more?
< 80 27
Last value should be (set
< 90 30 of data set)
Relative frequency

Relative frequencies (): are calculated by dividing the frequencies by the total number of observations in the data
set from which it was constructed.

Class mid-
Class Class Interval
point

1 [20;30) 25 3 0.1
2 [30;40) 35 2 0.066
3 [40;50) 45 3 0.1
4 [50;60) 55 5 0.167
5 [60;70) 65 6 0.2
6 [70;80) 75 8 0.267
Total should always
7 [80;90) 85 3 0.1 be 1
Total 30 1
Percentage frequency

Percentage frequencies (): are calculated by multiplying the relative frequencies by 100 and writing them as
percentages.

Class Class Interval


Class mid- Percentage
point frequency
1 [20;30) 25 3 0.1 10%
2 [30;40) 35 2 0.066 6.6%
3 [40;50) 45 3 0.1 10%
4 [50;60) 55 5 0.167 16.7%
5 [60;70) 65 6 0.2 20%
6 [70;80) 75 8 0.267 26.7%
7 [80;90) 85 3 0.1 10%
Total 30 1 100%
Relative cumulative frequencies

Relative cumulative frequencies (): are calculated by dividing their corresponding cumulative
frequencies by the total number of observation in the data set.

Relative
Class Cumulative
Boundary Frequency () Cumulative
Frequency (R)
< 20 0
0
< 30 3
0.1
< 40 5
0.167
< 50 8
0.267
< 60 13
0.433
< 70 19
0.633
< 80 27 Last value should
0.9
< 90 30 always be 1
1
Graphical representation of continuous data

• Dot plots

• Histograms

• Frequency polygons (+ relative)

• Cumulative frequency polygons (+ relative)


Dot plots

• Data is plotted on a horizontal axis

• Any observations with the same values are stacked on top of


one another
• Heights of stacks not that important

• Horizontal density tells us about the distribution of the data


• Frequencies are larger where data points are more densely
packed

15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
Statistics Marks (%)
Histograms

Histogram: a graphical representation of a frequency table

Histogram of Statistics marks


9

Frequency table on 8

page 58 7

(page 34 in older text 6


book) 5
f 4

Note the spaces Statistics marks (%)


Frequency polygons

Frequency polygon: a graph where the frequencies of each class interval is plotted against the class mid-points of
the corresponding class interval and then the points are joined with straight lines.

Histogram with frequency polygon of Statistics


marks
Frequency table on
9
page 59 8
7
6

f 5
4
3
2
1
0

Statistics marks (%)


Frequency polygons
Frequency polygon: a graph where the frequencies of each class interval is plotted against the class mid-points
of the corresponding class interval and then the points are joined with straight lines.

Frequency polygon of Statistics marks


9
8
7
6
5
f 4
3
2
1
0
10 20 30 40 50 60 70 80 90 100

Statistics marks (%)


15
Relative frequency polygons

• Frequencies are all divided by

Frequency polygon of Statistics marks


0.30

Frequency table on 0.25


page 61 0.20
(page 37 in older text
book) 0.15
r 0.10

0.05

0.00

-0.05

Statistics marks (%)


Cumulative frequency polygons

Cumulative frequency polygon: a line graph that displays the cumulative frequency of each
class at its upper class boundary

Cumulative frequency polygon of Statist-


Frequency table on ics marks
page 60 30
(page 38 in older text book
25

20
How many students
15
scored less than 50%? F
10

How many students 5

scored more than 80%? 0


20 30 40 50 60 70 80 90

Statistics marks (%)


Relative cumulative frequency polygons

• Cumulative frequencies are all divided by

Frequency table on
page 63 Relative cumulative frequency polygon
(page 38 in older text of Statistics marks
book 0.3

0.25

0.2

0.15
R
0.1

0.05

1.38777878078145E-17
20 30 40 50 60 70 80 90
-0.05

Statistics marks (%)


Homework

• Do exercises 1, 2, 3 and 5 at the end of Chapter 3.


These exercises cover continuous data.
Frequency Distribution of Discrete data

We will use the example data on page 51

Education Level Code


Primary School 0

Grade 08 to 10 1

Grade 11 to 12 2

Tertiary Education to Graduates 3

Post Graduates 4
Discrete Sample data

Education Levels of 40 People

(Unordered ungrouped data)

1 2 3 2 1 2 2 2

3 2 1 3 2 2 3 3

2 3 3 1 0 4 2 3

3 2 1 2 2 3 3 0

2 1 3 0 2 4 1 1
Discrete Sample data

Education Levels of 40 People

(Ordered ungrouped data)

0 0 0 1 1 1 1 1

1 1 1 2 2 2 2 2

2 2 2 2 2 2 2 2

2 2 3 3 3 3 3 3

3 3 3 3 3 3 4 4
Frequency table for Discrete data

Relative
Education Levels() Frequency()
frequency ()
0 0.075
1 0.2
2 0.375
3 0.3
4 0.05
Total 1

Total is always equal to Total is always equal to 1


(sample size)
Cumulative frequency table - Discrete

Education Level Cumulative Relative Cumulative


Frequency ()
() Frequency () Frequency ()

0 3 3 0.075
1 8 11 0.275
2 15 26 0.65
3 12 38 0.95
4 2 40 1

Last value is always equal Last value is always equal


to (sample size) to 1
Graphical representation of discrete data

• Dot plots

• Bar charts

• Pie charts
Dot plots for Discrete data

• Concentration occurs per category


• (Discrete nature of data)

Frequency table on
page 52
(page 40 in older text
book)

What does this dot plot


tell us? 0 1 2 3 4

Education Level
Bar charts

• Portrays same image of data as dot plot


• (Preferred when data set is large)

Bar chart of education levels


16

14
Frequency table on 12
page 64 10
(page 40 in older
text book) f 8

0
0 1 2 3 4
Education level

Note the gaps


Pie charts

• Calculate each sector's angle by multiplying its r value by


360 degrees.

Pie chart of the percentage fre-


quencies of 40 education levels
5.00% 7.50%

0
20.00%
1
30.00%
2
3
4

37.50%
Pie charts

• Each sector's size represents the percentage frequency of


the corresponding category

Pie chart of the percentage fre-


quencies of 40 education levels
5.00% 7.50%
Frequency table on
page 66
0
20.00%
1
30.00%
2
3
4

What percentage of
students are post-
graduates?
37.50%
Homework

Do exercise 4 at the end of Chapter 3. This exercise


focusses on discrete data.

You might also like