Professional Documents
Culture Documents
We make
So then what?
conclusions about So we summarise the
this data
world(population)
How do we summarise data?
Class
Class midpoint
tables [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
10
0
4
7
1
1
2
5
3
4
10
5
9
6
7
2
9
8
2
5
& [x;x)
[x;x)
xxx
xxx
xxx
xxx
xxx
xxx
6 7 5 10 7 7 5 6 1 7 [x;x) xxx xxx xxx
7
9
7
6
7
1
9
7
10
3
7
2
6
7
0
1
6
2
2
7 graphs XXX XXX
5 10 8 10 1 1 6 2 6 10
3 10 2 7 2 5 9 6 5 7
5 1 7 1 7 1 6 6 3 6
1 6 10 9 5 7 1 1 2 6
5 5 6 6 2 8 5 9 10 6
0 3 9 0 6 9 5 8 3 3
5 4 1 9 7 2 2 10 2 0
1 10 9 3 9 8 3 7 5 5
8 5 1 2 2 9 2 6 6 1
2 1 9 9 3 5 7 1 7 2
Continuous data vs. Discrete data
Continuous Discrete
65,1 59,7 23 63,6 64 4 1 1 1 3
51 58 47,1 49,8 28 2 3 0 3 3
71,8 44,6 82,6 31 57,5 0 3 1 4 3
83 68,3 42,7 86,9 74,7 4 4 2 4 1
35,2 31 50 76,4 48 0 2 2 1 3
26,2 46,6 72,1 34,8 81,9 3 0 2 3 4
47 23,9 44,3 41,9 71 2 3 4 3 0
52 57,1 80,6 61 79,1 4 4 4 4 4
Frequency distribution of continuous data
Array: a data set that has been sorted from smallest to largest.
20 12 19 28 21 30 25 12 9 35
9 12 12 19 20 21 25 28 30 35
Frequency distribution of continuous data
[10;20)
--------------
[ x; y ) All observations
41 32 65 48 50 53 76 81 36 46
22 66 69 63 54 78 87 79 76 71
Exercise:
62 57 53 28 89 65 75 79 25 70 See how many
observations you
can sort in
1 minute.
22 25 28 32 36 41 46 48 50 53
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
Step 1b – calculate the range
Range (: the difference between the largest and the smallest observations.
22 25 28 32 36 41 46 48 50 53
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
𝑅=max − 𝑚𝑖𝑛
¿ 89 − 22
¿ 67
Step 2 – determine the number of classes
Sturge's rule: the number of classes in a frequency table, can be determined by the formula: , where
is the number of observations in the data set and is rounded off to the nearest whole number.
¿ 5.762
classes
Class width (): the range of each class. Equal to range (step 1) divided by number of
classes (step 2).
𝑅
𝑤=
𝑘
67
𝑤 =¿ 11,167
6
Class width may be difficult to use Use a nice round number such as 10
• The lower boundary of the next class is the same as the upper boundary
of the current class.
Step 4 – calculate class boundaries(cont.)
• Upper boundary = 20 + 10 = 30
1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)
Class midpoint: the value that lies in the middle of each class interval
1 [20;30)
2 [30;40)
3 [40;50)
4 [50;60)
5 [60;70)
6 [70;80)
7 [80;90)
Step 5 – tabulate the observations(cont.)
1 [20;30) 25 3
How many students
2 [30;40) 35 2
scored between 20%
3 [40;50) 45 3 and 30%?
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3 Total always equal to
Total 30 (data set size)
Ungrouped data vs. Grouped data
53 54 57 62 63 65 65 66 69 70
71 75 76 76 78 79 79 81 87 89
Grouped data
1 [20;30) 25 3
2 [30;40) 35 2
3 [40;50) 45 3
4 [50;60) 55 5
5 [60;70) 65 6
6 [70;80) 75 8
7 [80;90) 85 3
Total 30
Cumulative frequency distribution
Cumulative frequency (): The cumulative frequency of a point with value x is defined as the total
number of observations in the data set with values less than or equal to x.
Class Cumulative
Boundary Frequency ()
< 20 0
< 30 3 How many students
< 40 5
scored less than 70?
< 50 8
< 60 13 How many students
< 70 19 scored 70 or more?
< 80 27
Last value should be (set
< 90 30 of data set)
Relative frequency
Relative frequencies (): are calculated by dividing the frequencies by the total number of observations in the data
set from which it was constructed.
Class mid-
Class Class Interval
point
1 [20;30) 25 3 0.1
2 [30;40) 35 2 0.066
3 [40;50) 45 3 0.1
4 [50;60) 55 5 0.167
5 [60;70) 65 6 0.2
6 [70;80) 75 8 0.267
Total should always
7 [80;90) 85 3 0.1 be 1
Total 30 1
Percentage frequency
Percentage frequencies (): are calculated by multiplying the relative frequencies by 100 and writing them as
percentages.
Relative cumulative frequencies (): are calculated by dividing their corresponding cumulative
frequencies by the total number of observation in the data set.
Relative
Class Cumulative
Boundary Frequency () Cumulative
Frequency (R)
< 20 0
0
< 30 3
0.1
< 40 5
0.167
< 50 8
0.267
< 60 13
0.433
< 70 19
0.633
< 80 27 Last value should
0.9
< 90 30 always be 1
1
Graphical representation of continuous data
• Dot plots
• Histograms
15 18 21 24 27 30 33 36 39 42 45 48 51 54 57
Statistics Marks (%)
Histograms
Frequency table on 8
page 58 7
Frequency polygon: a graph where the frequencies of each class interval is plotted against the class mid-points of
the corresponding class interval and then the points are joined with straight lines.
f 5
4
3
2
1
0
0.05
0.00
-0.05
Cumulative frequency polygon: a line graph that displays the cumulative frequency of each
class at its upper class boundary
20
How many students
15
scored less than 50%? F
10
Frequency table on
page 63 Relative cumulative frequency polygon
(page 38 in older text of Statistics marks
book 0.3
0.25
0.2
0.15
R
0.1
0.05
1.38777878078145E-17
20 30 40 50 60 70 80 90
-0.05
Grade 08 to 10 1
Grade 11 to 12 2
Post Graduates 4
Discrete Sample data
1 2 3 2 1 2 2 2
3 2 1 3 2 2 3 3
2 3 3 1 0 4 2 3
3 2 1 2 2 3 3 0
2 1 3 0 2 4 1 1
Discrete Sample data
0 0 0 1 1 1 1 1
1 1 1 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 3 3 3 3 3 3
3 3 3 3 3 3 4 4
Frequency table for Discrete data
Relative
Education Levels() Frequency()
frequency ()
0 0.075
1 0.2
2 0.375
3 0.3
4 0.05
Total 1
0 3 3 0.075
1 8 11 0.275
2 15 26 0.65
3 12 38 0.95
4 2 40 1
• Dot plots
• Bar charts
• Pie charts
Dot plots for Discrete data
Frequency table on
page 52
(page 40 in older text
book)
Education Level
Bar charts
14
Frequency table on 12
page 64 10
(page 40 in older
text book) f 8
0
0 1 2 3 4
Education level
0
20.00%
1
30.00%
2
3
4
37.50%
Pie charts
What percentage of
students are post-
graduates?
37.50%
Homework