You are on page 1of 33

• Sampling and Types of data

Exploring data • Graphical summaries


• Numerical summaries
Exploring data Graphical summaries

Outcomes
Know the definitions

State the purpose of graphical summaries

Identify which graphical method is best suited for which types of data

Extract information from graphical summaries


Exploring data Graphical summaries – Tables

• Only one rule to


follow.
• Keep the table
simple!
• It is meant to ease
understanding of
the data.
• May be used for any
type of data.
Exploring data Graphical summaries – Pie chart

Uses
• Divide total into
components
• Compare
components to
total
• Display only one
𝟕𝟔𝟎 𝟏𝟎𝟒𝟎 variable at a time
𝟑𝟔𝟎° 𝟑𝟔𝟎°
𝟐𝟒𝟎𝟎 𝟐𝟒𝟎𝟎 • Not suitable to
compare
components to
each other
• Not suitable for
𝟔𝟎𝟎 quantitative data
𝟑𝟔𝟎°
𝟐𝟒𝟎𝟎
Exploring data Graphical summaries – Simple bar

Uses
• Limited to one
variable
• Divides total into
components
• Detect trends
over time
• Suitable to
compare
components to
each other
• Not suitable for
quantitative data
Exploring data Graphical summaries – Multiple bar

Uses
• Compare multiple
variables
• Divides totals into
components
• Compare between
components
within and
between groups
• Detect trends and
associations
• Not suitable for
quantitative data
Exploring data Graphical summaries – Component /stacked bar

Uses
• Compare between
totals
• Difficult to
compare
components
within bars
• Multiple variables
• Compare each
component to
total – like in pie
chart
• Not suitable for
quantitative data
Exploring data Graphical summaries – % Component /stacked bar

Uses
• Compare each
component to total –
like in pie chart
• Difficult to compare
components within
each group
• Difficult to compare
components
between groups
• Cannot compare
totals between
groups
• Not suitable for
quantitative data
Exploring data Graphical summaries – Pareto chart

Uses
• Depicts which
97 3
88 6 94 3 situations are more
72 16 significant
51 21 • Suitable for any
type of data
Line chart ⇒ cumulative
Bar chart ⇒ descending
Exploring data Graphical summaries – Dot plot

E.g. Number of available cars for each of 20 households.


0 min 3
3
2
3
3
2
2
1
1
2
2 2
2
1 1
2 2
3 max 4
Uses
• Suitable only
Stack dots for
equal values in
for discrete
the data set data
• Small samples
• Relatively
small ranges
• To detect
patterns in
data sets
Exploring data Graphical summaries – Stem-and-leaf plot

E.g. Race times in seconds of 12 athletes:


12.2 14.1 14.2 14.4 15.2 15.3 15.7 15.8 16.1 16.2 16.4 17.5 Uses
• Suitable only
Smallest for
stem at top
Leaf from left to right quantitative
⇒ smallest to largest
data
No 13 so empty leaf
• To detect
patterns in
data sets
• Determine the
Line represents decimal point in data
shape of the
Key: 12.2 = 12 | 2 data
Exploring data Graphical summaries – Stem-and-leaf plot

E.g. Race times in seconds of 12 athletes:


12.2 14.1 14.2 14.4 15.2 15.3 15.7 15.8 16.1 16.2 16.4 17.5 Uses
• Suitable only
Smallest Data pattern for
stem at top
Leaf from left to right quantitative
⇒ smallest to largest
data
No 13 so empty leaf
• To detect
patterns in
data sets
• Determine the
Line represents decimal point in data
shape of the
Key: 12.2 = 12 | 2 data
Exploring data Graphical summaries – Line graph

E.g. Amount of rainfall

Uses
• Suitable for
any data
• To detect
patterns in
data sets
• Usually over
time
Exploring data Graphical summaries – Scatter plot

E.g.
Uses
• Suitable for
quantitative
data
• To detect
patterns in
bivariate data
sets
• Describe
associations
between
variables
Exploring data Graphical summaries – A NOTE ON SCALES

E.g. The table below contains the results of a survey in which people were asked how often they shop online. The graph
on the left below the table is the default graph generated by MSExcel for the table.

Why is this graph


Adjusted y-axis scale
misleading?
Exploring data Graphical summaries

Outcomes
Know the definitions

State the purpose of graphical summaries

Identify which graphical method is best suited for which types of data

Construct histograms, polygons, and ogives

Extract information from histograms, polygons, and ogives regarding the data
Exploring data Graphical summaries – Dot plot

E.g. Number of available cars for each of 20 households.


0 min 3
3
2
3
3
2
2
1
1
2
2 2
2
1 1
2 2
3 max 4
Uses
• Suitable only
for discrete
data
• Small samples
• Relatively
small ranges
• To detect
patterns in
data sets
Exploring data Graphical summaries – Discrete histogram

E.g. Number of available cars for each of 20 households. Uses


3 3 2 1 2 2 0 min 3 2 2 • Suitable only for
2 3 2 1 2 2 1 1 3 max 4 discrete data
• Relatively small
ranges
• To detect
Every bar has
width = 1 unit patterns in data
sets

No gaps since x-axis is number line.


Unlike bar chart
Exploring data Graphical summaries – Discrete histogram

Outcome Frequency Relative frequency Percentage Cumulative


Uses
0 1 0.05 = 1/20 5 = 0.05×100 1 0.05 5
1 4 0.20 = 4/20 20 = 0.20×100 5 0.25 25
• Suitable only for
2 9 0.45 = 9/20 45 = 0.45×100 14 0.70 70 discrete data
3 5 0.25 = 5/20 25 = 0.25×100 19 0.95 95 • Relatively small
4 1 0.05 = 1/20 5 = 0.05×100 20 1.00 100 ranges
• To detect
Every bar has
width = 1 unit patterns in data
sets
45%

Important for later


Area of each bar = %
e.g. 45% households
No gaps since x-axis is number line. have 2 cars
Unlike bar chart
Exploring data Graphical summaries – Continuous data
E.g. The mass in kilograms of each of n 55 bags of cement:
73.3 72.0 69.8 69.2 68.7 66.8 68.9 73.0 70.0 67.4 71.0
69.5 70.9 69.8 70.9 74.4 73.4 66.1 71.0 70.3 73.0 68.1
66.9 65.6 66.6 66.9 70.9 64.6 71.0 69.7 70.6 71.9 64.4
69.2 69.8 68.8 71.0 72.8 74.4 68.5 71.3 71.5 68.4 70.1
71.1 72.2 68.0 67.1 75.9 72.5 69.5 71.2 72.1 69.3 70.8

#classes = k = 6.74 → 7

w = 11.5 / 7 = 1.64 → 1.7 1.7 1.7 1.7 1.7 1.7 1.7

Create classes and group


Too cluttered
Range = R = 75.9 – 64.4 = 11.5
values
Exploring data Graphical summaries – Continuous data
E.g. The mass in kilograms of each of n 55 bags of cement: Starting point of classes?
73.3 72.0 69.8 69.2 68.7 66.8 68.9 73.0 70.0 67.4 71.0
69.5 70.9 69.8 70.9 74.4 73.4 66.1 71.0 70.3 73.0 68.1 Minimum value - ½ unit
66.9 65.6 66.6 66.9 70.9 64.6 71.0 69.7 70.6 71.9 64.4
How many classes?
69.2 69.8 68.8 71.0 72.8 74.4 68.5 71.3 71.5 68.4 70.1
71.1 72.2 68.0 67.1 75.9 72.5 69.5 71.2 72.1 69.3 70.8 Sturges’ rule:
#classes = k = 1 + 3.3 × log(n)
(rounded to nearest integer)
Size of each class?
Size = w = Range / #classes
(rounded up to next unit)
What is the Range?
#classes = k = 6.74 → 7
Range = R = maximum
w = 11.5 / 7 = 1.64 → 1.7 1.7 1.7 1.7 1.7 1.7 1.7 value – minimum value

Create classes and group


Too cluttered
Range = R = 75.9 – 64.4 = 11.5
values
Exploring data Graphical summaries – Histogram
E.g. The mass in kilograms of each of n 55 bags of cement: Starting point of classes?
73.3 72.0 69.8 69.2 68.7 66.8 68.9 73.0 70.0 67.4 71.0
69.5 70.9 69.8 70.9 74.4 73.4 66.1 71.0 70.3 73.0 68.1 Minimum value - ½ unit
66.9 65.6 66.6 66.9 70.9 64.6 71.0 69.7 70.6 71.9 64.4
How many classes?
69.2 69.8 68.8 71.0 72.8 74.4 68.5 71.3 71.5 68.4 70.1
71.1 72.2 68.0 67.1 75.9 72.5 69.5 71.2 72.1 69.3 70.8 Sturges’ rule:
#classes = k = 1 + 3.3 × log(n)
(rounded to nearest integer)
Size of each class?
Size = w = Range / #classes
(rounded up to next unit)
What is the Range?
Range = R = maximum
value – minimum value

Create classes and group


Too cluttered
values
Exploring data Graphical summaries – Histogram
Boundaries
64.35-66.05
Frequency
3
Relative frequency Percentage
0.05 5 3
Cumulative
0.05 5
Uses
66.05-67.75 7 0.13 13 10 0.18 18 • Suitable for
67.75-69.45 10 0.18 18 20 0.36 36 quantitative
69.45-71.15 19 0.35 35 39 0.71 71
data
71.15-72.85 9 0.16 16 48 0.87 87
72.85-74.55 6 0.11 11 54 0.98 98 • Large data
74.55-76.25 1 0.02 2 55 1.00 100 sets
• Many
different
values
• To detect
patterns in
data sets
Exploring data Graphical summaries – Histogram
Boundaries
64.35-66.05
Midpoints
65.2
Frequency
3
Relative frequency Percentage
0.05 5 3
Cumulative
0.05 5
Uses
66.05-67.75 66.9 7 0.13 13 10 0.18 18 • Suitable for
67.75-69.45 68.6 10 0.18 18 20 0.36 36 quantitative
69.45-71.15 70.2 19 0.35 35 39 0.71 71
data
71.15-72.85 72.0 9 0.16 16 48 0.87 87
72.85-74.55 73.7 6 0.11 11 54 0.98 98 • Large data
74.55-76.25 75.4 1 0.02 2 55 1.00 100 sets
• Many
Not part of original different
data set, but used
65.2 – 1.7 75.4 + 1.7 to prevent polygon values
from floating • To detect
patterns in
data sets
Exploring data Graphical summaries – Frequency polygon
Boundaries
64.35-66.05
Midpoints
65.2
Frequency
3
Relative frequency Percentage
0.05 5 3
Cumulative
0.05 5
Uses
66.05-67.75 66.9 7 0.13 13 10 0.18 18 • Suitable for
67.75-69.45 68.6 10 0.18 18 20 0.36 36 quantitative
69.45-71.15 70.2 19 0.35 35 39 0.71 71
data
71.15-72.85 72.0 9 0.16 16 48 0.87 87
72.85-74.55 73.7 6 0.11 11 54 0.98 98 • Large data
74.55-76.25 75.4 1 0.02 2 55 1.00 100 sets
• Many
different
values
• To detect
patterns in
data sets
Exploring data Graphical summaries – Frequency polygon
Boundaries
64.35-66.05
Midpoints
65.2
Frequency
3
Relative frequency Percentage
0.05 5 3
Cumulative
0.05 5
Uses
66.05-67.75 66.9 7 0.13 13 10 0.18 18 • Suitable for
67.75-69.45 68.6 10 0.18 18 20 0.36 36 quantitative
69.45-71.15 70.2 19 0.35 35 39 0.71 71
data
71.15-72.85 72.0 9 0.16 16 48 0.87 87
72.85-74.55 73.7 6 0.11 11 54 0.98 98 • Large data
74.55-76.25 75.4 1 0.02 2 55 1.00 100 sets
• Many
Usually rounded
different
Later we talk about values
distribution curves
• To detect
patterns in
data sets
Exploring data Graphical summaries – Ogive
Boundaries Midpoints Frequency Relative frequency Percentage Cumulative How many values have
64.35-66.05 65.2 3 0.05 5 3 0.05 5 we accumulated up to
point 64.35?
66.05-67.75 66.9 7 0.13 13 10 0.18 18
67.75-69.45 68.6 10 0.18 18 20 0.36 36
None

69.45-71.15 70.2 19 0.35 35 39 0.71 71 How many values have


we accumulated up to
71.15-72.85 72.0 9 0.16 16 48 0.87 87
point 66.05?
72.85-74.55 73.7 6 0.11 11 54 0.98 98
3
74.55-76.25 75.4 1 0.02 2 55 1.00 100

How many values have


we accumulated up to
point 76.25?
55
Exploring data Graphical summaries – Ogive
Boundaries Midpoints Frequency Relative frequency Percentage Cumulative
64.35-66.05 65.2 3 0.05 5 3 0.05 5
Uses
66.05-67.75 66.9 7 0.13 13 10 0.18 18 To determine
67.75-69.45 68.6 10 0.18 18 20 0.36 36 percentiles
69.45-71.15 70.2 19 0.35 35 39 0.71 71
71.15-72.85 72.0 9 0.16 16 48 0.87 87 A percentile is a value that
72.85-74.55 73.7 6 0.11 11 54 0.98 98
divides the data set into
two parts.
74.55-76.25 75.4 1 0.02 2 55 1.00 100

What weight separates the


lowest 20% from the rest?

Approximately
67.75
Exploring data Graphical summaries – Histogram
Boundaries Midpoints Frequency Relative frequency Percentage Cumulative
64.35-66.05 65.2 3 0.05 5 3 0.05 5
66.05-67.75 66.9 7 0.13 13 10 0.18 18
67.75-69.45 68.6 10 0.18 18 20 0.36 36
69.45-71.15 70.2 19 0.35 35 39 0.71 71
71.15-72.85 72.0 9 0.16 16 48 0.87 87
72.85-74.55 73.7 6 0.11 11 54 0.98 98
74.55-76.25 75.4 1 0.02 2 55 1.00 100

Recall:
35% Bar area represents % of
values

E.g. 35% of values are


between 69.45 and 71.15
Exploring data Graphical summaries – Frequency polygon
Boundaries Midpoints Frequency Relative frequency Percentage Cumulative
64.35-66.05 65.2 3 0.05 5 3 0.05 5
66.05-67.75 66.9 7 0.13 13 10 0.18 18
67.75-69.45 68.6 10 0.18 18 20 0.36 36
69.45-71.15 70.2 19 0.35 35 39 0.71 71
71.15-72.85 72.0 9 0.16 16 48 0.87 87
72.85-74.55 73.7 6 0.11 11 54 0.98 98
74.55-76.25 75.4 1 0.02 2 55 1.00 100

Recall:
35% Bar area represents % of
values

E.g. 35% of values are


between 69.45 and 71.15
Exploring data Graphical summaries – Interpreting Histograms/ Polygons

Left and right sides Uses


are mirror Assess
images • where a
Outliers are “extreme” distribution
values of a data set
is centered.
Outliers
• the spread of
a
distribution.
• the shape of a
distribution
Exploring data Graphical summaries – Interpreting Histograms/ Polygons

You might also like