You are on page 1of 34

Today’s session is dedicated to

THE PIZZA BASE!


Quick recap…
Remember DCOVA? The decision-making life cycle in statistics

DEFINE COLLECT ORGANIZE VISUALIZE ANALYZE

• Define the variables that you want to study in order to solve a business problem or meet a business
objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts, and other statistical methods to reach
conclusions
Classification of statistics

APPLIED

DESCRIPTIVE INFERENTIAL ANALYTICAL INDUCTIVE


Business Statistics – Session 2

Data Visualization
Learning Objectives

After studying this chapter, you should be able to:

 Describe descriptive and inferential statistics

 Explain collection, editing and classification of primary and secondary data

 Define tabulation and presentation of data

 Understand diagrammatic and graphical presentation

 Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives
To understand God’s thoughts…
To understand God’s thoughts…

we must study statistics, for these are the


measure of his purpose.

Florence Nightingale
The Lady with the Lamp and the Coxcomb
Activity: Video-based Discussion

Guidelines
• Watch the TED Talk video of Late
Dr. Hans Rosling, the renowned
Swedish physician, academic,
statistician, and public speaker
• What is the data story about?
• What do you notice about how data
has been presented?
• Discuss the learnings and
implications
Variable Types
Categorical Variables - These are also known as qualitative variables, but it can take a quantitative value if
coded, e.g. Male coded as 1, and Female as 2 in categorizing gender
Numerical Variables - These are also known as quantitative variables

Discrete Variables – These are variables that take only specific values, e.g. number of cars in a parking lot,
however, they need not be whole numbers. E.g. shoe sizes can be in decimals but not any value.

Continuous Variables - These are variables that take can take any value, and therefore, can be plotted along an
almost unbroken continuum, e.g. variables like height, length, weight, etc. These are always numerical variables

Question Responses Data Type


Do you currently
have a profile on Yes/No Categorical
Facebook?
How many text
messages have you Numerical
______
sent in the past (discrete)
week?
How long did it take
Numerical
to download a video _____ seconds
(continuous)
game?
Diagrammatical Presentation of Data

One-dimensional Two-dimensional Three-dimensional Pictograms/


Diagrams Diagrams Diagrams Cartograms

• Also known as bar • The value of an item is • With the help of three • These are like frequency plots.
diagrams, and the represented by an area. dimensional diagrams, the The data points are plotted on
magnitude of the Such diagrams are also values of various items are the graph in the same manner.
characteristics is shown known as ‘surface’ or ‘area represented by the volume of Then instead of joining the data
by the length or height diagrams’ cube, sphere, cylinder, etc. points, pictures or objects of the
of the bar • Popular forms include These diagrams are normally height of the data points are
• The width depends upon rectangular, square or used when the variations in used to depict the data
the number of bars to be circular (e.g. pie chart) the magnitudes of • Heights of the pictures or objects
accommodated in the ones observations are very large represent the frequency. These
diagrams include histograms and
frequency polygon
One-dimensional Diagrams: Bar Chart
One-dimensional Diagrams: Scatter Plot

Scatter Plot
• Scatter diagram is the most
fundamental graph plotted to show
relationship between two variables.
It is a simple way to represent
bivariate distribution
• Bivariate distribution is the
distribution of two random
variables. Two variables are plotted
one against each of the X and Y axis
• Scatter diagram thus, indicates
nature and strength of the
correlation.
Two-dimensional Diagrams: Pie Chart

Banking Preference

Banking %
Preference? ATM

ATM 16
% 16% 2% Automated or live
telephone
24%
Automated or live 2% Drive-through service at
telephone 17% branch

Drive-through 17 41% In person at branch

service at branch %
Internet
In person at branch 41
%
Internet 24
%
Histogram
• A vertical bar chart of the data in a
frequency distribution is called a
histogram
• In a histogram there are no gaps
between adjacent bars, as it
represents continuous data
• The class boundaries (or class
midpoints) are shown on the
horizontal axis
• The vertical axis is either frequency,
relative frequency, or percentage.
• The height of the bars represent the
frequency, relative frequency, or
percentage
Frequency Distribution (1/2)

A frequency distribution summarizes the numerical values by tallying them into


a set of numerically ordered classes.

Classes are the groups that represent a range of values, called a class
interval. Each value can be in only one class and every value must be contained
in one of the classes.

To create a useful frequency distribution, you must think about how many classes
are appropriate for your data and also determine a suitable width for each class
interval.

Determining the Class Interval Width

Interval Width = (Highest Value – Lowest Value)


Number of Classes
Frequency Distribution (2/2)

Frequency Distributions of the Cost per Meal for 50 City Restaurants and 50 Suburban
Restaurants

Cost Per Meal ($) City Frequency Suburban Frequency

20 but less than 30 6 5

30 but less than 40 7 17

40 but less than 50 19 17

50 but less than 60 9 7

60 but less than 70 6 4

70 but less than 80 3 0

Total 50 50
Relative Frequency and Percentage Distribution
When you are comparing two or more groups, as is done previously, knowing the proportion
or the percentage of the total that is in each group, is more useful than knowing the
frequency count of each group. For such situations, you create a relative frequency
distribution or a percentage distribution instead of a frequency distribution.

Relative Frequency Distributions and Percentage Distributions of the Cost


of Meals at City and Suburban Restaurants

CITY SUBURBAN
ST PER MEAL ($)
Relative Relative
Percentage (%) Percentage (%)
Frequency Frequency
20 but less than 30 0.12 12.0 0.10 10.0
30 but less than 40 0.14 14.0 0.34 34.0
40 but less than 50 0.38 38.0 0.34 34.0
50 but less than 60 0.18 18.0 0.14 14.0
60 but less than 70 0.12 12.0 0.08 8.0
70 but less than 80 0.06 6.0 0.00 0.0
Total 1.00 100.0 1.00 100.0
Percentage Polygon

When you construct polygons or histograms, the vertical Y-axis should show the true
zero, or the “origin,” so as not to distort the character of the given data.
The horizontal X-axis does not need to show the zero point for the variable of interest,
although the range of the variable should include the major portion of the axis.
Cumulative Frequency Distribution
The Cumulative Percentage Distribution provides a way of presenting information about the percentage
of values that are less than a specific amount.
For example, to know what percentage of the city restaurant meals cost less than $40 or what percentage
cost less than $50, you use the percentage distribution to form the cumulative percentage distribution.

Developing the Cumulative Percentage Distribution for the Cost of Meals at City Restaurants

Percentage of Meals Less Than Lower


Cost per Meal ($) Percentage (%)
Boundary of Class Interval (%)
20 but less than 30 12 0

30 but less than 40 14 12

40 but less than 50 38 26 = 12 + 14

50 but less than 60 18 64 = 12 + 14 + 38

60 but less than 70 12 82 = 12 + 14 + 38 + 18

70 but less than 80 6 94 = 12 + 14 + 38 + 18 + 12

80 but less than 90 0 100 = 12 + 14 + 38 + 18 + 12 + 6


Cumulative Percentage Polygon (Ogive)
The cumulative percentage polygon, or ogive, uses the cumulative percentage distribution to display the variable of
interest along the X - axis and the cumulative percentages along the Y-axis.
Time Series Plot

Year Combined Gross Y-o-y growth


1996 5,669.20 -
1997 6,393.90 12%
1998 6,523.00 5%
1999 7,317.50 10%
2000 7,659.50 5%
2001 8,077.80 7%
2002 9,146.10 13%
2003 9,043.20 (2%)
2004 9,359.40 3%
2005 8,817.10 (5%)
2006 9,231.80 4%
2007 9,685.70 3%
2008 9,707.40 1%
2009 10,675.60 10%
Tabulation of Data

 Tabulation is arranging the Types of


Tabulation
data in flat table
(two dimensional arrays)
format by grouping the Advantages
One – Way
of
Tabulation
observations. Tabulation

 Table is a spreadsheet
with rows and columns
with headings and stubs
indicating class of the
Multi –
Two – Way
data. Way
Tabulation
Tabulation

1– 23
Data Tabulation

• Statistical tables can be classified into various categories depending upon the basis of their
classification. Broadly speaking, the basis of classification can be any of the following:
o Purpose of investigation
o Nature of presented figures
o Construction
Data Tabulation: on the basis of purpose (1/2)

General purpose table


• A general purpose table is also called as a
reference table. This table facilitates easy
reference to the collected data. In the words of
Croxton and Cowden, “The primary and usually
the sole purpose of a reference table are to present
the data in such a manner that the individual
items may be readily found by a reader.”
• A general purpose table is formed without any
specific objective, but can be used for a number of
specific purposes. Such a table usually contains a
large mass of data and is generally given in the
appendix of a report
Data Tabulation: on the basis of purpose (2/2)

Special purpose table


• A special purpose table is also called a text table
or a summary table or an analytical table. Such a
table presents data relating to a specific problem.
• According to H. Secrist, “These tables are those in
which are recorded, not the detailed data which
have been analyzed, but rather the results of
analysis.”
• Such tables are usually of smaller size than the
size of reference tables and are generally found to
highlight relationship between various
characteristics or to facilitate their comparisons.
Data Tabulation: on the basis of nature of presented figures

Primary table
• Primary table is also known as original table and
it contains data in the form in which it were
originally collected
Derivative table
• A table which presents figures like totals,
averages, percentages, ratios, coefficients, etc.,
derived from original data. A table of time series
data is an original table but a table of trend
values computed from the time series data is
known as a derivative table.
Data Tabulation: on the basis of construction (1/2)

3-way

2-way

Simple table Complex table


• In this table, the data are presented • A complex table is used to present data
according to one characteristic only. according to two or more characteristics.
This is the simplest form of a table and Such a table can be two-way, three-way or
is also known as table of first order multi-way, etc.
Data Tabulation: on the basis of construction (2/2)

Cross-classified table
• Tables that classify entries in both
directions, i.e., row-wise and column-wise,
are called cross-classified tables. The two
ways of classification are such that each
category of one classification can occur
with any category of the other. The cross-
classified tables can also be constructed
for more than two characteristics also. A
cross-classification can also be used for
analytical purpose, e.g., it is possible to
make certain comparisons while keeping
the effect of other factors as constant.
Summary
 There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and
accomplish different objectives.

 Data can be obtained through primary source or secondary source according


to need, situation, convenience, time, resources and availability. The most
important method for primary data collection is through questionnaire. Data
must be objective and fact-based so that it helps a decision-maker to arrive at a
better decision.

 Statistical data is a set of facts expressed in quantitative form. Data is


collected through various methods. Sometimes our data set consists of the
entire population we are interested in. In other situations, data may constitute a
sample from some population.
Cont….
 Type of research, its purpose, conditions under which the data are obtained
will determine the method of collecting the data. If relatively few items of
information are required quickly, and funds are limited telephonic interviews are
recommended. If respondents are industrial clients Internet could also be used. If
depth interviews and probing techniques are to be used, it is necessary to employ
investigators to collect data.

 The quality of information collected through the filling of a questionnaire


depends, to a large extent, upon the drafting of its questions. Hence, it is
extremely important that the questions be designed or drafted very carefully
and in a tactful manner.

 Before any processing of the data, editing and coding of data is necessary to
ensure the correctness of data. In any research studies, the voluminous data
can be handled only after classification. Data can be presented through tables and
charts.
Cont….
 Classification refers to the grouping of data into homogeneous classes and
categories. It is the process of arranging things in groups or classes according
to their resemblances and affinities.

 A frequency distribution is the principle tabular summary of either discrete


data or continuous data. The frequency distribution may show actual, relative or
cumulative frequencies. Actual and relative frequencies may be charted as
either histogram (a bar chart) or a frequency polygon. Two commonly used
graphs of cumulative frequencies are less than ogive or more than ogive.

 Once the raw data is collected, it needs to be summarized and presented to


the decision-maker in a form that is easy to comprehend. Tabulation not only
condenses the data, but also makes it easy to understand. Tabulation is the
fastest way to extract information from the mass of data and hence popular even
among those not exposed to the statistical method.

Cont….
 The charts help in grasping the data and analyze it qualitatively. This also
helps managers to effectively present the data as a part of reports. Various types of
chart are bar diagram, multiple bar diagrams, component bar diagram, deviation
bar diagram, sliding bar diagram, Histogram and Pie charts.

 A graphic presentation is another way of representing the statistical data in a


simple and intelligible form. There are two types of graphs which we have
discussed, line graphs and ogives.

You might also like