You are on page 1of 38

Review of Basic

Statistical Concepts
(Data Presentation using Excel)

Statistical Analysis with Software Applications


Learning Objectives
At the end of this module, the students should be
able to:
1. Present and interpret data using tables and
graphs generated by Microsoft Excel;
2. Apply the principles of properly presenting
graphs and;
3. Recognize some errors committed in data
presentation.
Benefits of Excel’s attractive
features:
▪ Not having to incur the extra costs of using
specialized statistical programs;
▪ Familiarity with excel;
▪ Easy to use and easy to learn;
▪ Allows to use the same worksheet-based data
that users have created for other business
purposes and;
▪ Some graphical functions produce more vivid
visual outputs.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Univariate and Bivariate data
Univariate data is set of n observations in one
variable, e.g., height of Grade 3 pupils, GWA of
second year students, etc.

Bivariate data is a set of n observations involving


two variables, e.g., height and weight of newly born
babies, monthly income and expenses of selected
employees

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Organizing Univariate Categorical
Data: Summary Table
▪A summary table indicates the frequency, amount,
or percentage of items in a set of categories so that
you can see differences between categories.

How do you spend the holidays? Frequency Percentage


At home with family 25 50
Travel to visit family 7 14
Vacation 6 12
Catching up on work 8 16
Other 4 8

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-5
Organizing Univariate Categorical Data:
Bar Chart
▪In a bar chart, a bar shows each category, the
length of which represents the amount, frequency
or percentage of values falling into a category.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-6
Organizing Categorical Data:
Pie Chart
▪The pie chart is a circle broken up into slices that
represent categories. The size of each slice of the pie
varies according to the percentage in each category.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Chap 2-7
Organizing Bivariate Categorical Data
Cross Tabulations: The Contingency Table
▪ A cross-classification (contingency) table presents
the results of two categorical variables. The joint
responses are classified so that the categories of one
variable are located in the rows and the categories of
the other variable are located in the columns.
▪ The cell is the intersection of the row and column and
the value in the cell represents the data corresponding
to that specific pairing of row and column categories.
▪ A useful way to visually display the results of cross-
classification data is by constructing a side-by-side
bar chart.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-8
Bivariate Data: Cross Tabulations
A survey was conducted to determine whether
students stay in the Philippines during holidays. The
result, classified by gender, is as follows:

Where do you stay during holidays? Male Female Total


In the Philippines 16 20 36
Other countries 2 12 14
Total 18 32 50

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-9
Cross Tabulations:
Side-By-Side Bar Charts

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Organizing Numerical Data:
Ordered Array
▪An ordered array is a sequence of data, in rank
order, from the smallest value to the largest value.
Day Students
16 17 17 18 18 18
Age of 19 19 20 20 21 22
Surveyed
22 25 27 32 38 42
College
Students Night Students
18 18 19 19 20 21
23 28 32 33 41 45
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-11
Organizing Numerical Data:
Stem and Leaf Display
▪A stem-and-leaf display organizes data into
groups (called stems) so that the values within
each group (the leaves) branch out to the right on
each row.
Age of College Students
Day Students Night Students
Stem Leaf Stem Leaf

1 67788899 1 8899
2 0012257 2 0138
3 28 3 23
4 2 4 15
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-12
Organizing Numerical Data:
Frequency Distribution and Histogram
Bins and Bin Limits
▪ A frequency distribution is a table formed by
classifying n data values into k classes (bins)
▪ Bin limits define the values to be included in each
bin. Widths must all be the same.
▪ Frequencies are the number of observations
within each bin.
▪ Express as relative frequencies(frequency divided
by the total) or percentages(relative frequency
times 100)
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-13
Frequency Distributions and Histograms
Constructing a Frequency Distribution
1.Find the smallest and largest data values.
Quiz Scores in Math 607
55 23 45 50 58
22 34 38 38 42
36 30 30 50 52
54 54 49 54 60
20 27 39 36 47
53 52 57 50 50
37 45 48 55 59
35 55 46 41 42
39 45 42 38 50

2. Choose the number of bins (k).


- k should be much smaller than n.
-too many bins results in sparsely populated bins, too
few and dissimilar data values are lumped togetherChap 2-14
Frequency Distribution and Histogram
Constructing a Frequency Distribution
-Herbert Sturges proposes the following rule:

Sample Size Suggested Sample Size Suggested


(n) Number of (n) Number of
Bins (k) Bins (k)
16 5 256 9
32 6 512 10
64 7 1024 11
128 8
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Frequency Distributions and Histograms
Constructing a Frequency Distribution
3. Set the bin limits: Bin width xmax − xmin
=
k
For example, for k = 7 bins, the approximate bin width is:
40
Bin width = = 5.7  6
7
4. Put the data values in the appropriate bin.
5. Create the table, you can include
Frequencies –counts for each bin
Relative frequencies- absolute frequency divided by
the total number of data
Cumulative frequency – accumulated relative
frequency values as bin limit increases
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-16
Organizing Numerical Data:
The Histogram (Bar chart)
▪ A graph of the data in a frequency distribution is
called a histogram.
▪ The class boundaries (or class midpoints) are
shown on the horizontal axis.
▪ The vertical axis is either frequency, relative
frequency, or percentage.
▪ Bars of the appropriate heights are used to
represent the number of observations within
each class.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-17
Organizing Numerical Data:
The Histogram (Bar Chart)
Relative
Quiz Score Frequency Frequency Percentage
20 - 25 3 0.07 6.67
26 - 31 3 0.07 6.67
32 - 37 5 0.11 11.11 Quiz Score
38 - 43 9 0.20 20.00 15
44 - 49 7 0.16 15.56 FREQUENCY
10
50 - 55 14 0.31 31.11
5
56 - 61 4 0.09 8.89
Total 45 1.00 100.00 0
19.5 25.5 31.5 37.5 43.5 49.5 55.5 61.5
CLASS BOUNDARY
Organizing Numerical Data:
The Histogram in Excel

1. Select Tools/Data
Analysis

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-19
Organizing Numerical Data:
The Histogram in Excel

2. Choose Histogram

3. Input data range and bin


range (bin range is a cell range
containing the upper class boundaries
for each class grouping <20 use 19.9)

4. Select Chart Output


and click “OK”
Statistics for Managers Using Microsoft Excel, 5e © 2008
Chap 2-20
Pearson Prentice-Hall, Inc.
Organizing Numerical Data:
The Polygon (Line Graph)

▪A percentage polygon is formed by having the


midpoint of each class represent the data in that
class and then connecting the sequence of
midpoints at their respective class percentages.
▪The cumulative percentage polygon, or ogive,
displays the variable of interest along the X axis,
and the cumulative percentages along the Y axis.

Statistics for Managers Using Microsoft Excel, 5e © 2008


Chap 2-21
Pearson Prentice-Hall, Inc.
Organizing Numerical Data:
The Polygon (Line Graph)
Class
Quiz Score Midpoint Frequency <cf >cf rf Percentage
20 - 25 22.5 3 3 45 0.07 6.67
26 - 31 28.5 3 6 42 0.07 6.67
32 - 37 34.5 5 11 39 0.11 11.11
38 - 43 40.5 9 20 34 0.20 20.00
44 - 49 46.5 7 27 25 0.16 15.56
50 - 55 52.5 14 41 18 0.31 31.11
56 - 61 58.5 4 45 4 0.09 8.89

Frequency Polygon
15
FREQUENCY

10

(Note: In a percentage polygon the vertical 0


16.5 22.5 28.5 34.5 40.5 46.5 52.5 58.5 64.5
axis would be defined to show the
CLASS MIDPOINT
percentage of observations per class)
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-22
Bivariate Data - Scatter Plots

▪Scatter plots are used for numerical data


consisting of paired observations taken
from two numerical variables
▪One variable is measured on the vertical
axis and the other variable is measured on
the horizontal axis

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-23
Scatter Plot Example
Volume Cost
per day per day Cost per Day vs. Production Volume
23 125
250
26 140
200
Cost per Day

29 146 150
33 160 100
38 167 50
0
42 170
20 30 40 50 60 70
50 188 Volume per Day
55 195
60 200

Statistics for Managers Using Microsoft Excel, 5e © 2008


Chap 2-24
Pearson Prentice-Hall, Inc.
Time Series - Bivariate Data

▪A time-series plot is used to study patterns in


the values of a numerical variable over time. Each
value is plotted as a point in two dimensions with
the time period on the horizontal X axis and the
variable of interest on the Y axis.

Statistics for Managers Using Microsoft Excel, 5e © 2008


Pearson Prentice-Hall, Inc. Chap 2-25
Time Series Example

Attendance (in millions) at USA


amusement/theme parks from 2000-2005
Year Attendance
2000 317
2001 319
2002 324
2003 322
2004 328
2005 335
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-26
Time Series Example

Attendance (in millions) at US Theme Parks

336

332
Attendance

328

324

320

316
2000 2001 2002 2003 2004 2005

Year (Since 2000)

Statistics for Managers Using Microsoft Excel, 5e © 2008 Chap 2-27


Scatter Plot in Excel (97-2003)

1. Select the chart wizard

2. Select XY(Scatter)
option, then click “Next”

3. When prompted, enter


the data range, then
click “Next”.

4. Enter Title, Axis Labels,


and Legend and click
“Finish”

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Chap 2-28
Principles of Excellent Graphs
▪The graph should not distort the data.
▪The graph should not contain unnecessary
adornments (sometimes referred to as chart
junk).
▪The scale on the vertical axis should begin
at zero.
▪All axes should be properly labeled.
▪The graph should contain a title.
▪The simplest possible graph should be used
for a given set of data.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-29
Graphical Errors: Chart Junk

Bad Presentation ✓ Good Presentation


Minimum Wage
Minimum Wage
1960: $1.00 $
1970: $1.60
4

1980: $3.10 2

1990: $3.80 0
1960 1970 1980 1990

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Graphical Errors: No Relative Basis

Bad Presentation ✓ Good Presentation


A’s received by A’s received by
%
Freq. students. students.
30%
300

200 20%

100 10%

0%
FR SO JR SR FR SO JR SR

FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior

Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Chap 2-31
Graphical Errors: No Zero Point on
the Vertical Axis
Bad Presentation
✓ Good Presentations
Monthly Sales $ Monthly Sales
$ 45
45
42
42 39
39 36
36 0
Jan Feb Mar Apr MayJun Jan Feb Mar Apr MayJun

Graphing the first six months of sales

Chap 2-32
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.
Chapter Summary

Categorical Data

Tabulating Data Graphing Data

Summary Bar Charts Pie Charts Pareto


Table Diagram

Statistics for Managers Using Microsoft Excel, 5e © 2008


Chap 2-33
Pearson Prentice-Hall, Inc.
Chapter Summary

Numerical Data

Frequency Distributions and


Ordered Array
Cumulative Distributions

Stem and Leaf


Histogram Polygon Ogive
Display

Statistics for Managers Using Microsoft Excel, 5e © 2008


Chap 2-34
Pearson Prentice-Hall, Inc.
Chapter Summary
In this chapter, we have
▪ Organized categorical data using the summary
table, bar chart, pie chart, and Pareto diagram.
▪ Organized numerical data using the ordered
array, stem and leaf display, frequency
distribution, histogram, polygon, and ogive.
▪ Examined cross tabulated data using the
contingency table and side-by-side bar chart.
▪ Developed scatter plots and time series graphs.
▪ Examined the do’s and don'ts of graphically
displaying data.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc. Chap 2-35
Individual Work
References

David, M. (2017). Statistics for managers, using


Microsoft excel. Pearson Education India.

Levine, D. M., Stephan, D. F., Krehbiel, T. C., &


Berenson, M. L. (2008). STATISTICS FOR
MANAGERS USING Microsoft Excel.

You might also like