Professional Documents
Culture Documents
Chapter 2.
Presenting Data in Tables and Charts
Objectives
After completing this lesson, you should be able to:
Present data using tables: simple frequency distribution, grouped
frequency distribution and cross‐table
Draw different types of charts: bar charts, pie charts, dot plot,
histogram, polygon, ogive, side‐by‐side bar chart, scatterplots, time‐
series plot
2
8/11/2022
Outline
• Organizing data
• Tables and graphs for categorical data
• Tables and graphs for numerical data
• Presenting relationship between two variables
Organizing data
4
8/11/2022
Organizing data
Data collected through surveys are called ‘raw’ data. Data in raw form are
usually not easy to use for decision making
Some type of organization is needed
Table
Graph
The type of table/graph used depends on the variable being summarized
Organizing data
• Convey a message;
6
8/11/2022
Organizing data
Tables
• Simplest way to summarize data
• Including rows and columns containing data
• Data is presented as absolute numbers or percentages, or both
• Tables can be good for side‐by‐side comparisons but can lack visual impact
when used on a slide in a presentation.
Organizing data
Charts and graphs
• Visual representation of data
• They should be designed so that they convey at a single look the
general patterns of the data
• Usually data is presented using percentages
• The most informative graphs are simple and self‐explanatory.
• Graphs are easier to read than tables, they provide less detail.
8
8/11/2022
Effective presentation
Regardless what communication formats you use, the information should be
presented in a clear, concise way with key findings and recommendation that are
actionable.
For all communication formats it is important to ensure that there is:
• Consistency: Font, Colors, Punctuation, Terminology, Line/ Paragraph Spacing
• An appropriate amount of information: Less is more
• Appropriate content and format for audience: Scientific community,
Journalist, Politicians
Tables and graphs for categorical data
Categorical
data
Tabulating Graphing
data data
Frequency
Bar charts Pie charts
distribution
10
8/11/2022
Frequency distribution
11
Frequency distribution
• E.g. Guests staying at Victory hotel were asked to rate the quality of their
accommodations as being excellent, above average, average, below average, or
poor. The ratings provided by a sample of 20 guests are:
12
8/11/2022
Frequency distribution
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
13
Relative frequency distribution
• The relative frequency of a class is the fraction or proportion of the total number
of data items belonging to the class.
• For a data set with n observations:
14
8/11/2022
Percent frequency distribution
• The percent frequency of a class is the relative frequency multiplied by 100.
• A percent frequency distribution is a tabular summary of a set of data showing
the percent frequency for each class.
15
Relative frequency and Percent frequency table
Relative Percent
Rating Frequency Frequency
Poor 0.10 10
Below Average 0.15 15
Average 0.25 25 0.10x100 = 10
Above Average 0.45 45
Excellent 0.05 5
Total 1.00 100
1/20 = 0.05
8
16
8/11/2022
Bar graph
A bar graph (or bar chart) is a graphical device for depicting qualitative data.
On one axis (usually the horizontal axis), we specify the labels that are used for
each of the classes.
A frequency, relative frequency, or percent frequency scale can be used for the
other axis (usually the vertical axis).
Using a bar of fixed width drawn above each class label, we extend the height
appropriately.
The bars are separated to emphasize the fact that each class is a separate
category.
17
Bar graph
Victory Quality Ratings
10
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
9
18
8/11/2022
Pie chart
The pie chart is a commonly used graphical device for presenting relative
frequency distributions for qualitative data.
Draw a circle; then use the relative frequencies to subdivide the circle into sectors
that correspond to the relative frequency for each class.
19
Pie chart
Victory Quality Ratings
Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
10
20
8/11/2022
Tables and graphs for numerical data
Numerical data
Simple Grouped
Frequency
frequency frequency Dot plot Histogram Ogive
polygon
distribution distribution
21
Simple frequency distribution
Applications with discrete variables with few values.
• E.g. You are given a raw data of midterm marks of 20 students as follows:
7, 7, 10, 8, 5, 4, 5, 6, 4, 9, 8, 7, 6, 4, 8, 5, 7, 10, 10, 9
11
22
8/11/2022
Simple frequency distribution
Sort raw data in ascending order
Define classes and count the number of students in each class
Midterm mark Number of students
4 3
5 3 Frequency
6 2
7 4
8 3
9 2
10 3
Total 20
23
Grouped frequency distribution
Application
• Discrete variable with many values
• Continuous variable
Terminology:
• Lower value (lower class limit): the lowest value of one class
• Upper value (upper class limit): the highest value of one class
• Class interval (class width): range from lower to upper value
• Open‐ended class: the first or last classes in the range may be open‐ended. That
means they have no lower or/and upper values
→ Open‐ended class is designed for uncommon value: too low or too high 12
24
8/11/2022
Grouped frequency distribution
Steps to construct a grouped frequency table
• Sort raw data in ascending order
• Identify the maximum and minimum value and determine the range of the data
Range = maximum value – minimum value
• Determine the number of nonoverlapping classes
• Determine the width (size) of classes or class interval
• Determine the first lower class limit
• Construct the grouped frequency table
25
Grouped frequency distribution
Guidelines for Selecting Number of classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elements usually require a larger number of
classes.
• Smaller data sets usually require fewer classes.
• Round number for lower and/or upper value
13
26
8/11/2022
Grouped frequency distribution: equal class
interval
E.g. The manager of KFC restaurant would like to get a better picture of the
distribution of costs for meal purchase. A sample of 50 customer invoices has
been taken and the costs are listed below (in thousand dong).
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
27
Grouped frequency distribution: equal class
interval
Selecting width of classes
Use classes of equal width
Approximate class width =
14
28
8/11/2022
Grouped frequency distribution: equal class
interval
For KFC, if we choose six classes
Approximate Class Width = (109 ‐ 52)/6 = 9.5 ≈ 10
29
Relative, percent and cumulative frequency
distribution
Relative frequency of a class
30
8/11/2022
Relative, percent and cumulative frequency
distribution
Cost (000d) Frequency Relative Percent Cumulative Cumulative Cumulative
frequency frequency frequency relative percent
frequency frequency
50-<60 2 0.04 4 2 0.04 4
31
Relative, percent and cumulative frequency
distribution
• Insights gained from the above frequency distribution:
• Only 4% of the meal costs are in the 50‐<60 thousand dong
• 30% of the meal costs are under 70 thousand dong
• The greatest percentage (32%) of the meal costs are in the 70‐<80 thousand
dong
• 10% of the meal costs are 100 thousand dong or more
• 45 invoices (90%) are less than 100 thousand dong
16
32
8/11/2022
Grouped frequency distribution: unequal class
intervals
• E.g: the weekly wages of employees of Salt Lake Ltd are presented as
in the frequency table below
Wages per employee ($) Number of employee
40 - 60 4
> 60 - ≤ 80 6
> 80 - ≤ 90 6
> 90 - ≤ 120 6
> 120 - 150 3
Total 25
Why unequal class intervals: Each class reflects differences in the nature of data
33
Grouped frequency distribution: open‐ended
class
• Application with continuous variables.
• E.g. Draw a frequency table of wages (in USD) paid to 30 people in
Alvin ltd as follows:
202 277 654 145 361
457 67 44 240 144
310 391 362 437 429
176 325 221 374 216
480 120 274 398 282
153 470 303 338 209
17
34
8/11/2022
Grouped frequency distribution: open‐ended
class
Wages ($) # of people
(class interval) (Frequency)
<100 2
100 - <200 5
Open-ended
class, no lower 200 - <300 8
value
300 - <400 9
400 - <500 5
≥500 1
Total Open-ended 30
class, no upper
value
35
Grouped frequency distribution: open‐ended
class
• How to calculate class intervals of open‐ended classes: by convention, the
width of an open‐ended class is the same as that of the adjoining class.
• In example of wages in Alvin ltd, class intervals of the 1st and last class are
$100.
18
36
8/11/2022
Frequency distribution: summary
• Simple frequency distribution: easy task and can either do manually or
rely on statistical software
• Grouped frequency distribution: more difficult. The hardest task is to
decide the number of classes and class width or class intervals. Ideal: each
class reflects differences in the nature of data.
• The upper value of the previous class should not coincide with the lower
value of the following class to make sure each value should only be in one
class.
37
Frequency distribution: Activity
• The average wages ($) of 20 people have been recorded as follows:
19
38
8/11/2022
Dot plot
• One of the simplest graphical summaries of data is a dot plot.
• A horizontal axis shows the range of data values.
• Then each data value is represented by a dot placed above the axis.
KFC restaurant
.
. .. . . .
. .. .. .. .. . .
. . . ..... .......... .. . .. . . ... . .. .
50 60 70 80 90 100 110
Cost (000d)
39
Histogram
Another common graphical presentation of quantitative data is a histogram
The variable of interest is placed on the horizontal axis.
A rectangle is drawn above each class interval with its height corresponding to the
interval’s frequency, relative frequency, or percent frequency.
A histogram looks like a bar charts except that the bars are joined together
Two types of histograms:
• Equal‐width histogram
• Unequal‐width histogram
20
40
8/11/2022
Equal‐width histogram
KFC restaurant
18
16
14
12
Frequency
10
8
6
4
2
Cost (000d)
50 60 70 8090 90 100-110
All bars have the same width (same class intervals)
41
Unequal‐width histogram
• The width of each bar must be proportional to the corresponding
class interval
• Frequency = area of the rectangular, or
• Height of bar = frequency/class width
• Note: it is not necessary for the horizontal and vertical axis to have
the same unit in length
21
42
8/11/2022
Unequal‐width histogram
• From Ex. of Salt Lake Ltd
43
Unequal‐width histogram
Frequency
density
0.6
0.5
0.4
0.3
0.2
0.1
22
44
8/11/2022
Shape of Histogram
Symmetric
Left tail is the mirror image of the right tail
Examples: heights and weights of people (normal distribution)
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
45
Shape of Histogram
Moderately Skewed Left
A longer tail to the left
Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
23
46
8/11/2022
Shape of Histogram
Moderately Skewed Right
A longer tail to the right
Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
47
Shape of Histogram
High Skewed Right
A very long tail to the right
Example: executive salaries
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
24
48
8/11/2022
Important uses of a Histogram
• Visually displays the shape of the distribution of the data
• Shows the location of the center of the data
• Shows the spread of the data
• Identifies outliers
49
Frequency polygon
• The frequency polygon is a graph that displays the data by using lines
that connect point plotted for the frequencies at the midpoints of the
classes. The frequencies are represented by the heights of the points
25
50
8/11/2022
Frequency polygon
KFC restaurant
18
75, 16
16
14 65, 13
12
Frequency
10
8 85, 7 95, 7
6 105, 5
4
55, 2
2
45, 0
0
0 10 20 30 40 50 60 70 80 90 100 110 120
Cost (000d)
51
Ogive
An ogive is a graph of a cumulative distribution
How to draw an ogive
The data values are shown on the horizontal axis
Shown on the vertical axis are the
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class is plotted as a point against the upper
class limit of the interval
The plotted points are connected by straight lines
26
52
8/11/2022
Ogive
100
Cumulative Percent Frequency
80
60 (90, 76)
40
20
Cost (000d)
50 60 70 80 90 100 110
53
Ogive
27
54
8/11/2022
Presenting relationship between two variables
The relationship between two variables can be display by:
Tables:
Cross‐tabulation
Charts:
Time‐series plot
55
Cross‐tabulation: the contingency table
A contingency (cross‐classification) table presents the results of two categorical
or numerical variables or one variable is categorical and the other is numerical.
The joint responses are classified so that the categories of one variable are
located in the rows and the categories of the other variable are located in the
columns.
The cell is the intersection of the row and column and the value in the cell
represents the data corresponding to that specific pairing of row and column
categories.
The cells for each row‐column combination contain in frequency, the percentage
of the overall total, the percentage of the row total, or the percentage of the
column total. 28
56
8/11/2022
Cross‐tabulation: the contingency table
E.g. Crosstabulation of quality rating and meal price for 300 restaurants in Hanoi.
Quality Meal Price (thousand dong)
Total
Rating 50‐60 60‐70 70‐80 80‐90
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
A review of the crosstabulation in this table reveals that restaurants with higher
meal prices received higher quality ratings than restaurants with lower meal prices
57
Cross‐tabulation: the contingency table
E.g. Summarize in percentages of the overall total.
Quality Meal Price (thousand dong)
Total
Rating 50‐60 60‐70 70‐80 80‐90
Good 14.00 13.33 0.67 0.00 28.00
Very Good 11.33 21.33 15.33 2.00 50.00
Excellent 0.67 4.67 9.33 7.33 22.00
Total 26.00 39.33 25.33 9.33 100.00
29
58
8/11/2022
Cross‐tabulation: the contingency table
E.g. Summarize in percentages of the row total.
Quality Meal Price (thousand dong)
Total
Rating 50‐60 60‐70 70‐80 80‐90
Good 50.00 47.62 2.38 0.00 100.00
Very Good 22.67 42.67 30.67 4.00 100.00
Excellent 3.03 21.21 42.42 33.33 100.00
Total 26.00 39.33 25.33 9.33 100.00
59
Cross‐tabulation: the contingency table
E.g. Summarize in percentages of the column total.
Quality Meal Price (thousand dong)
Total
Rating 50‐60 60‐70 70‐80 80‐90
Good 53.85 33.90 2.63 0.00 28.00
Very Good 43.59 54.24 60.53 21.43 50.00
Excellent 2.56 11.86 36.84 78.57 22.00
Total 100.00 100.00 100.00 100.00 100.00
30
60
8/11/2022
Cross‐tabulation: the contingency table
• E.g. Crosstabulation of Investment in Thousands of Dollars
Total
Investment Investor A Investor B Investor C
Category
Stocks 46.5 55 27.5 129
Bonds 32 44 19 95
CD 15.5 20 13.5 49
Savings 16 28 7 51
Total 110 147 67 324
Summarize data in the percentage of the overall total, the percentage of the
row total, or the percentage of the column total.
61
Side‐by‐side bar charts
• A useful way to visually display the results of cross‐classification data is by
constructing a side‐by‐side bar chart.
• Side‐by‐side bar charts are a very popular type of bar charts in which there are
multiple bars attached to form a group and this group of multiple bars is
repeated.
• Side‐by‐Side bar charts are used to compare multiple measures with each other.
So we can observe the change in those measures with time or across different
categories.
31
62
8/11/2022
Side‐by‐side bar charts
• Side‐by‐side bar charts
Quality Rating
70
60
50
40
30
20
10
0
50‐60 60‐70 60‐80 80‐90
Meal Price (thousand dong)
Good Very Good Excellent
63
Side‐by‐side bar charts
• Side‐by‐side bar charts
Comparing Investor
Savings
CD
Bonds
Stocks
0 10 20 30 40 50 60
Investor C Investor B Investor A 32
64
8/11/2022
Scatterplots (Scatter Diagrams)
• Scatterplots may be the most common and most effective display for data.
• Scatterplots are the best way to examine possible relationship between two
quantitative variables (bivariate data).
• X axis – the explanatory (or predictor) variable.
• Y axis – the response variable.
65
Scatterplots (Scatter Diagrams)
Volume Cost Relationship between Volume per day and Cost per day
per day per day 250
23 120
200
26 144
Cost per day
29 146 150
33 160 100
38 167
42 50
180
50 190 0
55 195 0 10 20 30 40 50 60 70
Volume per day
60 196
33
66
8/11/2022
Time‐series plot
• A time‐series plot is used to study patterns in the values of a numerical variable
over time.
• Each value is plotted as a point in two dimensions with the time period on the
horizontal X axis and the variable of interest on the Y axis.
67
Time‐series plot
• E.g. Attendance (in millions) at Hanoi zoo parks from 2014‐2019
Year Attendance
2014 3.17
2014 3.19
2015 3.24
2016 3.22
2017 3.28
2019 3.35
34
68
8/11/2022
Time‐series plot
• E.g. Attendance (in millions) at Hanoi zoo parks from 2014‐2019
3.4
3.35
3.35
3.3 3.28
Attendance
3.24
3.25 3.22
3.19
3.2 3.17
3.15
3.1
3.05
2014 2014 2015 2016 2017 2019
Year
69
Constructing tables: some guidelines
35
70
8/11/2022
Principles of Excellent Graphs
The graph should not distort the data.
The graph should not contain unnecessary adornments (sometimes
referred to as chart junk).
The scale on the vertical axis should begin at zero.
All axes should be properly labeled.
The graph should contain a title.
The simplest possible graph should be used for a given set of data.
71
Using SPSS
After Importing your dataset, and providing names to variables, click on:
Analyze/ Descriptive Statistics/ Frequency
• Choose any variables to be analyzed and place them in box on right
• Options include (For Categorical Variables):
• Frequency Tables
• Pie Charts, Bar Charts
• Options include (For Numeric Variables)
• Frequency Tables (Useful for discrete data)
• Pie Charts, Bar Charts, Histograms
36
72
8/11/2022
Using SPSS
73
Using SPSS
37
74
8/11/2022
Using SPSS
• Graphs/Legacy Diaglogs/…
75
Using SPSS
• Graphs/Legacy Diaglogs/Bar
38
76
8/11/2022
Using SPSS
• Graphs/Legacy Diaglogs/Bar/Simple bar
Put the variable of interest as the CATEGORY AXIS
Bars Represent:
‐ categorical variables: choose N of Cases or % of
Cases
‐ Numerical variables: choose one of the options
Put the 2nd categorical variable as Panel by Rows or
Columns if you want to draw parallel graphs for the
2nd categorical groups
77
Using SPSS
39
78
8/11/2022
Using SPSS
• Clustered (Side‐by‐side) bar graph • Stacked bar graph
79
Using SPSS
• Graphs/Legacy Diaglogs/Pie
Slices Represent: choose N of Cases or %
of Cases
40
80
8/11/2022
Using SPSS
81
Using SPSS
• Histogram for numerical variables, two ways:
Analyze/ Descriptive Statistics/ Graphs/Legacy Diaglogs/Histogram
Frequency/Charts/Histogram
41
82
8/11/2022
Using SPSS
83
Using SPSS
• Scatterplot for 2 numerical variables
Graph/ Legacy Dialogs/ Scatter/ Simple
‐ For Y‐AXIS, choose the Dependent (Response)
Variable
‐ For X‐AXIS, choose the Independent (Explanatory)
Variable
42
84
8/11/2022
Using SPSS
85
Using SPSS
• Contigency Tables
Analyze/ Descriptive Statistics/ Crosstabs
‐ For ROWS, select the variable you are conditioning
on (Independent Variable)
‐ For COLUMNS, select the variable you are finding
the conditional probability of (Dependent Variable)
‐ Click on CELLS, choose Percentages, if necessary
43
86
8/11/2022
Using SPSS
Skin colour * Kinds of job Crosstabulation
Kinds of job Total
CLERICAL OFFICE SECURITY COLLEGE EXEMPT MBA TECHNICA
TRAINEE OFFICER TRAINEE EMPLOYEE TRAINEE L
87
Summary
In this chapter, we have
Organized categorical data using the summary table, bar chart and pie chart.
Organized numerical data using the frequency distribution, histogram,
polygon, and ogive.
Examined cross tabulated data using the contingency table and side‐by‐side
bar chart.
Developed scatter plots and time series graphs.
44
88