You are on page 1of 54

Session 2: Describing data visually

Statistics for Business


Dr. Le Anh Tuan

2
Graphical Presentation of Data
►Data in raw form are usually not easy to use for decision-
making.

►Methods of organizing, exploring, and summarizing data


include:
►Visual method (charts, graphs, tables) provides
insight into the characteristics of a data set without
using mathematics.
►Numerical method (statistics) provides insight into
the characteristics of a data set using mathematics.

►The type of graph to use depends on the variable being


summarized.

3
Graphical Presentation of Data

►Categorical Variables
►Frequency distribution
►Bar chart
►Pie chart
►Pareto diagram

►Numerical variables
►Bar chart
►Line chart
►Frequency distribution
►Histogram and ogive
►Stem-and-leaf display
►Scatter plot
4
Graphical Presentation of Data

Categorical
Data

Tabulating Data Graphing Data

Frequency
Distribution Bar Pie Pareto

Table Chart Chart Diagram


5
Bar and Pie Charts

► Bar charts and Pie charts are often used for qualitative
(category) data.

► The height of the bar or the size of the pie slice shows the
frequency or percentage for each category.

► A simple bar chart can be used to display the same data and
would be preferred by many statisticians.

6
Graphical Presentation of Data
Summarize data by category

Example: Students by Majors


Major Number of students

Finance 120

International Business 200

Marketing 150

Management 50

Accounting 75

(Variables are categorical)


Bar Charts
Major Number of students

Finance 120
Number of students
International 200
Business
Number of students
Marketing 150

Management 50

Accounting 75
200

150
120

75
50

Finance International Marketing Management Accounting


Business

8
Bar Charts

9
Bar Charts

► Clustered bar charts group


several values side by side
within the same category in
a vertical direction.

► Stacked bar charts group


several values in a single
column within the same
category in a vertical
direction.

10
Pie Charts
► Pie charts are another excellent tool for comparing
proportions for categorical data.

► Each segment of the pie represents the relative frequency of


one category.

► Pie charts should be used to portray data that sum to a total


(e.g., percent market shares).
► All categories in the data set must be included in the
pie.

► Each slice can be labeled with data values or percentages.

11
Pie Charts
Major # of students Percentage

Finance 120 20

IB 200 34

Marketing 150 25

Management 50 8

Accounting 75 13

Number of students
Finance Int ernational Business Marketing Management Accoun ting

13%
20%

8%

25%
34%

12
Pareto Diagram

13
Pareto Diagram Example
► A Pareto Chart is a combination of a bar graph and a line
graph.
► This chart helps organizations and individuals identify
and prioritize the most significant factors contributing to
a particular issue or problem.
► The most problematic categories are shown first.
► For example, you collect customer complaints.

Customer Complaints Frequency

Product 9

Service 7
Store 5

Price 3
Location 2

14
Pareto Diagram Example

► Step 1: Sort by defect cause, in descending order

► Step 2: Determine % and cumulative % in each category

Customer Cumulative
Frequency Percentage
Complaints Percentage
Product 9 35 35
Service 7 27 62
Store 5 19 81
Price 3 11 92
Location 2 8 100
Total 26 100

15
Pareto Diagram Example
► Step 3: Show results graphically
Frequency Percentage

10 120

9
100
8

7
80
6

5 60

4
40
3

2
20
1

0 0
Product Service Store Price Location

► Pareto charts make it simple to discover the most


significant factors that contribute to a problem. The
categories on the chart's left side are the most essential
and require the greatest attention.
16
Graphical Presentation of Data

Numerical Data

Frequency Distributions and Stem-and-Leaf


Cumulative Distributions Display

Histogram Ogive

17
Frequency Distribution

►A frequency distribution is a table formed by classifying n


data values into k classes (bins).

►Frequencies are the number of observations within each


(class) bin.

18
Relative Frequency Distribution

►Relative frequency distributions display the proportion of


observations of each class relative to the number of
observations.

►Show the fraction of observations in each class.


►Founding by dividing each frequency by the total
number of observations.
►The fractions in a relative frequency distribution add
up to 1.00.

19
Number of Classes
►Use at least 5 but no more than 15-20 classes.
►Methods to determine the number of classes in a
frequency distribution:
►The rule : 2" ≥ $
Where k=Number of Classes
n=Number of Data points. (Observation)
►Find the lowest value of k that satisfies the rule.
►For example, n=50
25 = 32 < 50
26 = 64 > 50, " = 6 ./ 0 1223 4ℎ2.46
►Another rule:
789:16/’ <9=6: " = 1 + 3.3 log $ ,
$ ./ 0 /0EF=6 /.G6.

20
Frequency Distribution

►Once desired classes (k) are known, the width of each class
can be found.
►The width is the range of numbers to put into each class.
►Determine the width of each class by
largest number - smallest number
w = interval width =
number of desired intervals

►Classes never overlap


►Round up the interval width to get desirable class
endpoints

21
Frequency Distribution

►There is no one correct answer for the class width.

►The goal is to create a histogram to clearly and


usefully show the pattern in the data.

►Often there is more than one acceptable way to


accomplish this.

22
Class Boundaries

►Class boundaries represent the minimum and maximum


values for each class.

►Choose class boundaries that are easy to read

3 to less than 6 minutes 3.21 to less than 6.21 minutes


6 to less than 10 minutes 6.21 to less than 10.21 minutes
10 to less than 15 minutes 10.21 to less than 15.21 minutes

23
Frequency Distribution

►Example: A manufacturer randomly selects 20 winter days


and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27

24
Frequency Distribution

►Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

►Find range: 58 - 12 = 46

►Select number of classes: 5 (or may use the 2k rule)

►Compute interval width: 10 (46/5 then round up)

►Determine interval boundaries: 10 but less than 20, 20 but


less than 30, . . . , 60 but less than 70

►Count observations & assign to classes

25
Frequency Distribution

►Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Interval Frequency Frequency Percentage

10 but less than 20 3 0.15 15


20 but less than 30 6 0.30 30
30 but less than 40 5 0.25 25
40 but less than 50 4 0.20 20
50 but less than 60 2 0.10 10
Total 20 1 100

26
Histograms
►A histogram is a graphical representation of a frequency
distribution.

►A histogram is a bar chart.

►The Y-axis (vertical) shows frequency within each class.

►The X-axis (horizontal) ticks show the end points of each


class.

27
Histograms
Interval Frequency

10 but less than 20 3

20 but less than 30 6


Histogram: Daily High Temperature
30 but less than 40 5 7 6
40 but less than 50 4 6 5
50 but less than 60 2 5 4

Frequency
Total 20 4 3
3 2
2
1 0 0
0
(No gaps 0 10 20 30 40 50 60
between bars) Temperature in Degrees

28
The Shapes of Histograms

29
The Consequences of Too Few or Too
Many Classes
► Wide classes result in few class Weight Distribution

intervals
9

► Can hide important patterns.


8

► Gives a “blocky” distribution 7

graph. 6

► Summarizes the data too 5

much 4

► Tell a little about the true 3

distribution shape. 2

0
[8, 51] (51, 94] (94, 137]

30
The Consequences of Too Few or Too
Many Classes
► Too many narrow classes
have consequences:

4
► Result in a
“jagged”

3
histogram

Frequency
► Some classes may

2
be empty
► Does not 1
summarize the
data enough
0

0 20 40 60 80 100
weight
10

(bin=13, start=8, width=7)

31
The Ogive

►The Ogive is a line graph that plots the cumulative relative


frequency distribution.

►A cumulative relative frequency distribution totals the


proportion of observations that are less than or equal to
the class at which you are looking.
►Show the accumulated proportion as values vary
from low to high.

32
The Cumulative Frequency Distribution

►Sort raw data in ascending order:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative
Relative Cumulative
Interval Frequency Percentage Percentage
Frequency Frequency
10 but less than 20 3 0.15 15 3 15

20 but less than 30 6 0.30 30 9 45


30 but less than 40 5 0.25 25 14 70

40 but less than 50 4 0.20 20 18 90


50 but less than 60 2 0.10 10 20 100

Total 20 1 100

33
The Ogive Graphing Cumulative Frequencies

Ogive: Daily High Temperature

100
Cumulative Percentage

80
60
40
20
0
10 20 30 40 50 60

34
Stem-and-Leaf Diagram

35
Stem-and-Leaf Diagram

► A simple way to see distribution details in a data set.

► Method: Separate the sorted data series into leading digits


(the stem) on the left and the remaining digits on the right
(the leaves)

► All the original data points are visible on the display.

► Easy to construct by hand for small datasets.

36
Stem-and-Leaf Diagram

Data in ordered array:


21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Here, use the 10’s digit for the stem unit.
Completed stem-and-leaf diagram:

Stem Leaves

2 1 4 4 6 7 7
3 0 2 8
4 1

37
Stem-and-Leaf Diagram
► Using the 100’s digit as the stem and the 10’s digit as the stem
► In this illustration, the leaf digits have been sorted, although this
is not necessary.
Stem Leaf
► Data on scores:
6 135
613, 632, 658, 717, 7 1257
722, 750, 776, 827,
841, 859, 863, 891, 8 245699
894, 906, 928, 933, 9 02358
955, 982, 1034, 1047,
1056, 1140, 1169, 1224 10 345
11 46
12 2
Stem-and-Leaf Diagram
► The Stem-and-Leaf Diagram shows the distribution of data.

► For example, in the dataset above, you can see that most scores
fall in the 800s, with a few scores in the 1100s and 1200s.

Frequency Stem Leaf


3 6 135
4 7 1257
6 8 245699
5 9 02358
3 10 345
2 11 46
1 12 2
24
Dot Plots

40
Dot Plots
►A dot plot is the simplest graphical display of n individual
values of numerical data.
►Easy to understand.
►It reveals dispersion, central tendency, and the
shape of the distribution.

►If more than one data value lies at about the same axis
location, the dots are placed vertically.

41
Dot Plots

►The range is from 0 to 7.


►High frequency focuses on 3

42
Graphs for Time-Series Data

43
Graphs for Time-Series Data
► A line chart (time-series plot) is used to show the values of a variable
over time

► Time is measured on the horizontal axis (X)

► The variable of interest is measured on the vertical axis (Y)

44
Graphs for Time-Series Data

45
Relationships Between Variables

► Graphs illustrated so far have involved only a single variable

► When two variables exist other techniques are used:

► Categorical (Qualitative) Variables ➔ Cross tables

► Numerical (Quantitative) Variables ➔ Scatter plots

46
Cross Tables

► Cross Tables (or contingency tables) list the number of


observations for every combination of values for two
categorical or ordinal variables.

► If there are r categories for the first variable (rows) and c


categories for the second variable (columns), the table is called
an r x c cross table

► Tools: PivotTables

47
Cross Tables
► 4 x 3 Cross Table for Investment Portfolios by Investor (values in millions
VND)

Investor A Investor B Investor C Total


Savings 25 40 5 70
Stock market 31 10 28 69
Bond market 10 20 40 70
Insurance 0 5 20 25
66 75 93 234

48
Cross Tables

Investment Portfolio
45
40
35
30
25
20
15
10
5
0
Savings Stock market Bon d market Insurance

Investor A Investor B Investor C

49
Scatter Plots

50
Scatter Plots
► Scatter plots can convey patterns in data pairs that would not be
apparent from a table.

► A scatter plot is a visual representation of data points in a two-


dimensional space, where each data point is represented by a
dot or marker.

► It shows how data points are distributed graphically and if there


is a visual pattern or trend.

51
Scatter Plots
GDP
Happiness Per Capita
Index ($US) Happiness and GDP Per Capita
9 40,000 70000

3 10,230
4 12,939 60000

3 9,383
50000
6 28,300
2 4,000

GDP per capita


40000
10 65,000
3 9,999
30000
9 33,200
4 9,311 20000
5 15,494
8 32,030 10000

0
0 2 4 6 8 10 12
Happiness Index

52
Scatter Plots
► The figure shows a scatter plot with
Happiness Index on the X-axis and
GDP per Capita on the Y-axis.
► In this illustration, there seems to be Happiness and GDP Per Capita
70000
an association between X and Y.
► That is, nations with higher 60000

happiness levels tend to have higher 50000

GDP per capita (and vice versa).

GDP per capita


40000

► Scatter plots only provide a direction 30000

of the relationship between two 20000

variables by observing the slope of 10000


the trendline or pattern.
► No cause-and-effect relationship is
0
0 2 4 6 8 10 12
Happiness Index
implied because it can show
associations but cannot determine
whether one variable causes changes
in the other.

53
Scatter Plots
► To qualitatively assess the strength of the relationship between variables, we
inspect visually how closely data points cluster around a trendline.
► However, it is not easy and not quantitative.

54
Exercise

► Review Session 2, Online Quiz 2.

► Reading Chapter 4. Descriptive statistics.

55

You might also like