SU3 - Chapter 3-1

STUDY UNIT 3
FREQUENCY DISTRIBUTION AND

GRAPHICAL REPRESENTATION OF DATA
Study Outcomes
• Tabulate discrete and continuous data
• Graphically represent discrete and continuous data

Statistical process
We would like to How?

By gathering data
learn from this world
But the raw data

We are here! doesn't make any
sense…
We make
So then what?
conclusions about So we summarise the
this data
world(population)
How do we summarise data?
Class
Class midpoint
tables [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
10
0
4
7
1
1
2
5
3
4
10
5
9
6
7
2
9
8
2
5
& [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
6 7 5 10 7 7 5 6 1 7 [x;x) xxx xxx xxx
7
9
7
6
7
1
9
7
10
3
7
2
6
7
0
1
6
2
2
7 graphs XXX XXX
5 10 8 10 1 1 6 2 6 10
3 10 2 7 2 5 9 6 5 7
5 1 7 1 7 1 6 6 3 6
1 6 10 9 5 7 1 1 2 6
5 5 6 6 2 8 5 9 10 6
0 3 9 0 6 9 5 8 3 3
5 4 1 9 7 2 2 10 2 0
1 10 9 3 9 8 3 7 5 5
8 5 1 2 2 9 2 6 6 1
2 1 9 9 3 5 7 1 7 2
Continuous data vs. Discrete data
Graphical methods used in both sets differ slightly
Continuous Discrete
65,1 59,7 23 63,6 64 4 1 1 1 3
51 58 47,1 49,8 28 2 3 0 3 3
71,8 44,6 82,6 31 57,5 0 3 1 4 3
83 68,3 42,7 86,9 74,7 4 4 2 4 1
35,2 31 50 76,4 48 0 2 2 1 3
26,2 46,6 72,1 34,8 81,9 3 0 2 3 4
47 23,9 44,3 41,9 71 2 3 4 3 0
52 57,1 80,6 61 79,1 4 4 4 4 4
Frequency distribution of continuous data
Array: a data set that has been sorted from smallest to largest.
20 12 19 28 21 30 25 12 9 35
9 12 12 19 20 21 25 28 30 35
Fraquency table: table with classes(intervals) of values and corresponding frequencies
Class: represents all values within its boundaries
[10;20)
--------------
[ x; y ) All observations
[ x  including x greater than and

including 10
y )  excluding y
and
less than and

excluding 20
Frequency (: number of observations within a particular class
Class 15 observations between 10 and 20

(including 10, excluding 20)
[10;20) 15
12 observations between 20 and 30
[20;30) 12 (including 20, excluding 30)
Constructing a Frequency Table for Continuous data
• Step 1a – arrange data into an array

• Step 1b – calculate the range ()
• Step 2 – determine the number of classes
• Step 3 – calculate the class width ()
• Step 4 – calculate the class boundaries
• Step 5 – tabulate the observations
Step 1a – order the data into an array
We will use the following data to construct our frequency table:
Statistics marks (%) of 30 students
41 32 65 48 50 53 76 81 36 46
22 66 69 63 54 78 87 79 76 71
Exercise:
62 57 53 28 89 65 75 79 25 70 See how many
observations you
can sort in
1 minute.
22 25 28 32 36 41 46 48 50 53
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
Step 1b – calculate the range
Range (: the difference between the largest and the smallest observations.
22 25 28 32 36 41 46 48 50 53
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
𝑅=max − 𝑚𝑖𝑛
¿ 89 − 22
¿ 67
Step 2 – determine the number of classes
Sturge's rule: the number of classes in a frequency table, can be determined by the formula: , where
is the number of observations in the data set and is rounded off to the nearest whole number.
𝑛=30 𝑘=1+ 1.4 ln ( 𝑛 )
¿ 5.762
classes
Use Sturge's rule only as a guideline
Too few classes – info could be lost

Too many classes – data becomes sparsely distributed
Step 3 – calculate the class width
Class width (): the range of each class. Equal to range (step 1) divided by number of
classes (step 2).
𝑅
𝑤=
𝑘
67
𝑤 =¿ 11,167
6
Class width may be difficult to use  Use a nice round number such as 10
This causes the number of classes to increase from 6 to 7

Step 4 – calculate class boundaries
• Class boundaries specify which observations they contain
• Start with lower boundary

• Must be less than or equal to the lowest observation
• The upper boundary = lower boundary + class width
• The lower boundary of the next class is the same as the upper boundary
of the current class.
Step 4 – calculate class boundaries(cont.)
• Lets start at 20 as the first lower boundary
• Use class range of 10
• Upper boundary = 20 + 10 = 30
First class [20; 30) Second class

[30; 40)
…
Last class [80; 90)
Note: Make sure that the highest observation is less than or equal to the upper boundary of the last class
Step 5 – tabulate the observations
Original raw data on page 35
Class Class Interval Class mid-point
1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)
Class midpoint: the value that lies in the middle of each class interval
Class midpoint for class 1:
1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)
Frequency (): number of observations within a particular class
Statistics mark (%)

Number of students
1 [20;30) 25 3
How many students
2 [30;40) 35 2
scored between 20%
3 [40;50) 45 3 and 30%?
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3 Total always equal to
Total 30 (data set size)
Ungrouped data vs. Grouped data
Ungrouped (raw) data

22 25 28 32 36 41 46 48 50 53
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
Grouped data
1 [20;30) 25 3
2 [30;40) 35 2
3 [40;50) 45 3
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3
Total 30
Cumulative frequency distribution
Cumulative frequency (): The cumulative frequency of a point with value x is defined as the total
number of observations in the data set with values less than or equal to x.
Class Cumulative
Boundary Frequency ()
< 20 0
< 30 3 How many students
< 40 5
scored less than 70?
< 50 8
< 60 13 How many students
< 70 19 scored 70 or more?
< 80 27
Last value should be (set
< 90 30 of data set)
Relative frequency
Relative frequencies (): are calculated by dividing the frequencies by the total number of observations in the data
set from which it was constructed.
Class mid-
Class Class Interval
point
1 [20;30) 25 3 0.1
2 [30;40) 35 2 0.066
3 [40;50) 45 3 0.1
4 [50;60) 55 5 0.167
5 [60;70) 65 6 0.2
6 [70;80) 75 8 0.267
Total should always
7 [80;90) 85 3 0.1 be 1
Total 30 1
Percentage frequency
Percentage frequencies (): are calculated by multiplying the relative frequencies by 100 and writing them as
percentages.
Class Class Interval

Class mid- Percentage
point frequency
1 [20;30) 25 3 0.1 10%
2 [30;40) 35 2 0.066 6.6%
3 [40;50) 45 3 0.1 10%
4 [50;60) 55 5 0.167 16.7%
5 [60;70) 65 6 0.2 20%
6 [70;80) 75 8 0.267 26.7%
7 [80;90) 85 3 0.1 10%
Total 30 1 100%
Relative cumulative frequencies
Relative cumulative frequencies (): are calculated by dividing their corresponding cumulative
frequencies by the total number of observation in the data set.
Relative
Class Cumulative
Boundary Frequency () Cumulative
Frequency (R)
< 20 0
0
< 30 3
0.1
< 40 5
0.167
< 50 8
0.267
< 60 13
0.433
< 70 19
0.633
< 80 27 Last value should
0.9
< 90 30 always be 1
1
Graphical representation of continuous data
• Dot plots
• Histograms
• Frequency polygons (+ relative)
• Cumulative frequency polygons (+ relative)

Dot plots
• Data is plotted on a horizontal axis
• Any observations with the same values are stacked on top of

one another
• Heights of stacks not that important
• Horizontal density tells us about the distribution of the data

• Frequencies are larger where data points are more densely
packed
15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
Statistics Marks (%)
Histograms
Histogram: a graphical representation of a frequency table
Histogram of Statistics marks

9
Frequency table on 8
page 58 7
(page 34 in older text 6

book) 5
f 4
Note the spaces Statistics marks (%)

Frequency polygons
Frequency polygon: a graph where the frequencies of each class interval is plotted against the class mid-points of
the corresponding class interval and then the points are joined with straight lines.
Histogram with frequency polygon of Statistics

marks
Frequency table on
9
page 59 8
7
6
f 5
4
3
2
1
0
Statistics marks (%)

Frequency polygons
Frequency polygon: a graph where the frequencies of each class interval is plotted against the class mid-points
of the corresponding class interval and then the points are joined with straight lines.
Frequency polygon of Statistics marks

9
8
7
6
5
f 4
3
2
1
0
10 20 30 40 50 60 70 80 90 100

15
Relative frequency polygons
• Frequencies are all divided by
Frequency polygon of Statistics marks

0.30
Frequency table on 0.25

page 61 0.20
(page 37 in older text
book) 0.15
r 0.10
0.05
0.00
-0.05

Cumulative frequency polygons
Cumulative frequency polygon: a line graph that displays the cumulative frequency of each
class at its upper class boundary
Cumulative frequency polygon of Statist-

Frequency table on ics marks
page 60 30
(page 38 in older text book
25
20
How many students
15
scored less than 50%? F
10
How many students 5
scored more than 80%? 0

20 30 40 50 60 70 80 90

Relative cumulative frequency polygons
• Cumulative frequencies are all divided by
Frequency table on
page 63 Relative cumulative frequency polygon
(page 38 in older text of Statistics marks
book 0.3
0.25
0.2
0.15
R
0.1
0.05
1.38777878078145E-17
20 30 40 50 60 70 80 90
-0.05

Homework
• Do exercises 1, 2, 3 and 5 at the end of Chapter 3.

These exercises cover continuous data.
Frequency Distribution of Discrete data
We will use the example data on page 51
Education Level Code

Primary School 0
Grade 08 to 10 1
Grade 11 to 12 2
Tertiary Education to Graduates 3
Post Graduates 4
Discrete Sample data
Education Levels of 40 People
(Unordered ungrouped data)
1 2 3 2 1 2 2 2
3 2 1 3 2 2 3 3
2 3 3 1 0 4 2 3
3 2 1 2 2 3 3 0
2 1 3 0 2 4 1 1
Discrete Sample data
Education Levels of 40 People
(Ordered ungrouped data)
0 0 0 1 1 1 1 1
1 1 1 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 3 3 3 3 3 3
3 3 3 3 3 3 4 4
Frequency table for Discrete data
Relative
Education Levels() Frequency()
frequency ()
0 0.075
1 0.2
2 0.375
3 0.3
4 0.05
Total 1
Total is always equal to Total is always equal to 1

(sample size)
Cumulative frequency table - Discrete
Education Level Cumulative Relative Cumulative

Frequency ()
() Frequency () Frequency ()
0 3 3 0.075
1 8 11 0.275
2 15 26 0.65
3 12 38 0.95
4 2 40 1
Last value is always equal Last value is always equal

to (sample size) to 1
Graphical representation of discrete data
• Dot plots
• Bar charts
• Pie charts
Dot plots for Discrete data
• Concentration occurs per category

• (Discrete nature of data)
Frequency table on
page 52
(page 40 in older text
book)
What does this dot plot

tell us? 0 1 2 3 4
Education Level
Bar charts
• Portrays same image of data as dot plot

• (Preferred when data set is large)
Bar chart of education levels

16
14
Frequency table on 12
page 64 10
(page 40 in older
text book) f 8
0
0 1 2 3 4
Education level
Note the gaps

Pie charts
• Calculate each sector's angle by multiplying its r value by

360 degrees.
Pie chart of the percentage fre-

quencies of 40 education levels
5.00% 7.50%
0
20.00%
1
30.00%
2
3
4
37.50%
Pie charts
• Each sector's size represents the percentage frequency of

the corresponding category
Pie chart of the percentage fre-

quencies of 40 education levels
5.00% 7.50%
Frequency table on
page 66
0
20.00%
1
30.00%
2
3
4
What percentage of
students are post-
graduates?
37.50%
Homework
Do exercise 4 at the end of Chapter 3. This exercise

focusses on discrete data.

SU3 - Chapter 3-1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SU3 - Chapter 3-1

Uploaded by

Copyright:

Available Formats

STUDY UNIT 3

FREQUENCY DISTRIBUTION AND

• Tabulate discrete and continuous data

• Graphically represent discrete and continuous data

We would like to How?

But the raw data

Graphical methods used in both sets differ slightly

Fraquency table: table with classes(intervals) of values and corresponding frequencies

Class: represents all values within its boundaries

[ x  including x greater than and

less than and

Frequency (: number of observations within a particular class

Class 15 observations between 10 and 20

• Step 1a – arrange data into an array

We will use the following data to construct our frequency table:

Statistics marks (%) of 30 students

𝑛=30 𝑘=1+ 1.4 ln ( 𝑛 )

Use Sturge's rule only as a guideline

Too few classes – info could be lost

This causes the number of classes to increase from 6 to 7

• Class boundaries specify which observations they contain

• Start with lower boundary

• The upper boundary = lower boundary + class width

• Lets start at 20 as the first lower boundary

• Use class range of 10

First class [20; 30) Second class

Original raw data on page 35

Class Class Interval Class mid-point

Class midpoint for class 1:

Class Class Interval Class mid-point

Frequency (): number of observations within a particular class

Statistics mark (%)

Class Class Interval Class mid-point

Ungrouped (raw) data

Class Class Interval Class mid-point

Class Class Interval

• Frequency polygons (+ relative)

• Cumulative frequency polygons (+ relative)

• Data is plotted on a horizontal axis

• Any observations with the same values are stacked on top of

• Horizontal density tells us about the distribution of the data

Histogram: a graphical representation of a frequency table

Histogram of Statistics marks

(page 34 in older text 6

Note the spaces Statistics marks (%)

Histogram with frequency polygon of Statistics

Statistics marks (%)

Frequency polygon of Statistics marks

Statistics marks (%)

• Frequencies are all divided by

Frequency polygon of Statistics marks

Frequency table on 0.25

Statistics marks (%)

Cumulative frequency polygon of Statist-

How many students 5

scored more than 80%? 0

Statistics marks (%)

• Cumulative frequencies are all divided by

Statistics marks (%)

• Do exercises 1, 2, 3 and 5 at the end of Chapter 3.

We will use the example data on page 51

Education Level Code

Tertiary Education to Graduates 3

Education Levels of 40 People