You are on page 1of 37

Business Statistics – Session 2

Data Visualization
Learning Objectives

After studying this chapter, you should be able to:

 Describe descriptive and inferential statistics

 Explain collection, editing and classification of primary and secondary data

 Define tabulation and presentation of data

 Understand diagrammatic and graphical presentation

 Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives
Session Outline

11 Data and Statistics

22 Collecting and Processing Data

33 Presenting Data
Quick recap…
Remember DCOVA? The decision-making life cycle in statistics

DEFINE COLLECT ORGANIZE VISUALIZE ANALYZE

• Define the variables that you want to study in order to solve a business problem or meet a business objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts, and other statistical methods to reach conclusions
Activity 1: Understanding Data and Variables

Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Understand the information presented in
the sheet and try to categorize it into
Mutually Exclusive, Collectively
Exhaustive (MECE) buckets
• Discuss the differences between various
data fields and what purpose do they
serve
• How will you proceed with analysing
this data?
Variable Types (1/2)
Categorical Variables - These are also known as qualitative variables.
Numerical Variables - These are also known as quantitative variables.

Discrete Variables – These are variables that take only specific values and could be numerical or
categorical

Continuous Variables - These are variables that define a continuum and are numerical
Question Responses   Data Type
Do you currently
have a profile on  Yes/No Categorical
Facebook?
How many text
messages have you Numerical
______
sent in the past (discrete)
week?
How long did it take
Numerical
to download a video _____ seconds
(continuous)
game?
Variable Types (2/2)
Discussion question: What is the implication for Descriptive and Inferential statistics?

Nominal Ordinal Interval Ratio

• Used to “name” (and so • Is used to “order” data • Possesses all the • Possesses all the characteristics
Nominal) or label a set (and so Ordinal) in a characteristics of ordinal data of interval data
of values certain sequence • In addition, the difference • In addition, there is a true zero,
• Can be qualitative or • Magnitude of difference between intervals is known whose value remains universal
quantitative (label between two values not and is uniform • Kelvin scale, altitude, height are
denoting categories) known • However, there is no true zero some examples
• Examples can include • Typically measure abstract • Celsius and Fahrenheit scale
gender, income constructs like is a good example to consider
categories, city type satisfaction, loyalty etc.
Session Outline

11 Descriptive and Inferential Statistics

22 Collecting and Processing Data

33 Presenting Data
Collection of Data

Types of Data – Primary and Secondary

Methods of Collecting Primary Data

Merits and Demerits of Collecting Primary Data

Methods of Collecting Secondary Data

Designing Questionnaire
Editing and Coding of Data
Coding of Data
Editing Primary Data

 Completeness
 Coding is the process of
 Consistency
assigning some symbols
 Accuracy either alphabetical or numeral or

 Homogeneity both to the answers so that


the responses can be recorded
Editing Secondary Data into a limited number of classes or
 Field Editing categories.

 Central Editing
Classification of Data
Classification refers to the
1 2
grouping of data into

homogeneous classes and


Bases
Rulesofof
Frequency
categories. It is the process of
Classification
Classification
Distribution
arranging things in groups or
3
classes according to their

resemblances and affinities.


Session Outline

11 Descriptive and Inferential Statistics

22 Collecting and Processing Data

33 Presenting Data
Activity 2: Video-based Discussion

Guidelines
• Watch the TED Talk video of Late Dr.
Hans Rosling, the renowned Swedish
physician, academic, statistician, and
public speaker
• What is the data story about?
• What do you notice about how data has
been presented?
• Discuss the learnings and implications
Tabulation of Data

Types of
 Tabulation is arranging the Tabulation
Types of
Tabulation
data in flat table (two
dimensional arrays) format by
grouping the observations. One – Way Advantages
OneTabulation
– Way of Tabulation
Advantages
 Table is a spreadsheet with Tabulation of Tabulation
rows and columns with
headings and stubs indicating
class of the data.

Two – Way Multi – Way


TwoTabulation
– Way MultiTabulation
– Way
Tabulation Tabulation

1– 14
Data Tabulation

• Statistical tables can be classified into various categories depending upon the basis of their classification. Broadly
speaking, the basis of classification can be any of the following:
o Purpose of investigation
o Nature of presented figures
o Construction
Data Tabulation: on the basis of purpose (1/2)

General purpose table


• A general purpose table is also called as a reference table.
This table facilitates easy reference to the collected data.
In the words of Croxton and Cowden, “The primary and
usually the sole purpose of a reference table are to present
the data in such a manner that the individual items may be
readily found by a reader.”
• A general purpose table is formed without any specific
objective, but can be used for a number of specific
purposes. Such a table usually contains a large mass of
data and is generally given in the appendix of a report
Data Tabulation: on the basis of purpose (2/2)

Special purpose table


• A special purpose table is also called a text table or a
summary table or an analytical table. Such a table presents
data relating to a specific problem.
• According to H. Secrist, “These tables are those in which
are recorded, not the detailed data which have been
analyzed, but rather the results of analysis.”
• Such tables are usually of smaller size than the size of
reference tables and are generally found to highlight
relationship between various characteristics or to facilitate
their comparisons.
Data Tabulation: on the basis of nature of presented figures

Primary table
• Primary table is also known as original table and it
contains data in the form in which it were originally
collected
Derivative table
• A table which presents figures like totals, averages,
percentages, ratios, coefficients, etc., derived from
original data. A table of time series data is an original
table but a table of trend values computed from the time
series data is known as a derivative table.
Data Tabulation: on the basis of construction (1/2)

3-way

2-way

Simple table Complex table


• In this table, the data are presented according • A complex table is used to present data according
to one characteristic only. This is the simplest to two or more characteristics. Such a table can be
form of a table and is also known as table of two-way, three-way or multi-way, etc.
first order
Data Tabulation: on the basis of construction (2/2)

Cross-classified table
• Tables that classify entries in both directions,
i.e., row-wise and column-wise, are called cross-
classified tables. The two ways of classification
are such that each category of one classification
can occur with any category of the other. The
cross-classified tables can also be constructed
for more than two characteristics also. A cross-
classification can also be used for analytical
purpose, e.g., it is possible to make certain
comparisons while keeping the effect of other
factors as constant.
Diagrammatical Presentation of Data

One-dimensional Two-dimensional Three-dimensional Pictograms/


Diagrams Diagrams Diagrams Cartograms

• Also known as bar diagrams, • The value of an item is • With the help of three dimensional • These are like frequency plots. The
and the magnitude of the represented by an area. Such diagrams, the values of various data points are plotted on the graph in
characteristics is shown by diagrams are also known as items are represented by the the same manner. Then instead of
the length or height of the ‘surface’ or ‘area diagrams’ volume of cube, sphere, cylinder, joining the data points, pictures or
bar • Popular forms include etc. These diagrams are normally objects of the height of the data
• The width depends upon the rectangular, square or circular used when the variations in the points are used to depict the data
number of bars to be (e.g. pie chart) ones magnitudes of observations are • Heights of the pictures or objects
accommodated in the very large represent the frequency. These
diagrams include histograms and frequency
polygon
One-dimensional Diagrams: Bar Chart
One-dimensional Diagrams: Scatter Plot

Scatter Plot
• Scatter diagram is the most fundamental
graph plotted to show relationship
between two variables. It is a simple way
to represent bivariate distribution
• Bivariate distribution is the distribution of
two random variables. Two variables are
plotted one against each of the X and Y
axis
• Scatter diagram thus, indicates nature and
strength of the correlation.
Two-dimensional Diagrams: Pie Chart

Banking Preference

Banking Preference? %
ATM
ATM 16%
Automated or live 2% 16% 2% Automated or live
telephone 24% telephone

Drive-through service at
Drive-through service 17% 17% branch
at branch
41% In person at branch
In person at branch 41%
Internet 24% Internet
Histogram
• A vertical bar chart of the data in a
frequency distribution is called a
histogram
• In a histogram there are no gaps between
adjacent bars, as it represents continuous
data
• The class boundaries (or class midpoints)
are shown on the horizontal axis
• The vertical axis is either frequency,
relative frequency, or percentage.
• The height of the bars represent the
frequency, relative frequency, or
percentage
Frequency Distribution (1/2)

A frequency distribution summarizes the numerical values by tallying them into a set of


numerically ordered classes.

Classes are the groups that represent a range of values, called a class interval. Each value can
be in only one class and every value must be contained in one of the classes.

To create a useful frequency distribution, you must think about how many classes are
appropriate for your data and also determine a suitable width for each class interval.

 Determining the Class Interval Width

Interval Width = (Highest Value – Lowest Value)


Number of Classes
Frequency Distribution (2/2)

Frequency Distributions of the Cost per Meal for 50 City Restaurants and 50 Suburban Restaurants

Cost Per Meal ($) City Frequency Suburban Frequency

20 but less than 30  6  5

30 but less than 40  7 17

40 but less than 50 19 17

50 but less than 60  9  7

60 but less than 70  6  4

70 but less than 80  3  0

Total 50 50
Relative Frequency and Percentage Distribution
When you are comparing two or more groups, as is done previously, knowing the proportion or the
percentage of the total that is in each group, is more useful than knowing the frequency count of each
group. For such situations, you create a relative frequency distribution or a percentage distribution instead
of a frequency distribution.

Relative Frequency Distributions and Percentage Distributions of the Cost of


Meals at City and Suburban Restaurants

CITY SUBURBAN
ST PER MEAL ($) Relative Relative
Percentage (%) Percentage (%)
Frequency Frequency
20 but less than 30 0.12  12.0 0.10  10.0
30 but less than 40 0.14  14.0 0.34  34.0
40 but less than 50 0.38  38.0 0.34  34.0
50 but less than 60 0.18  18.0 0.14  14.0
60 but less than 70 0.12  12.0 0.08   8.0
70 but less than 80 0.06   6.0 0.00   0.0
Total 1.00 100.0 1.00 100.0
Cumulative Distribution
The Cumulative Percentage Distribution provides a way of presenting information about the percentage
of values that are less than a specific amount.
For example, to know what percentage of the city restaurant meals cost less than $40 or what percentage
cost less than $50, you use the percentage distribution to form the cumulative percentage distribution.

Developing the Cumulative Percentage Distribution for the Cost of Meals at City Restaurants

Percentage of Meals Less Than Lower


Cost per Meal ($) Percentage (%)
Boundary of Class Interval (%)
20 but less than 30 12  0

30 but less than 40 14  12

40 but less than 50 38  26 = 12 + 14

50 but less than 60 18  64 = 12 + 14 + 38

60 but less than 70 12  82 = 12 + 14 + 38 + 18

70 but less than 80  6  94 = 12 + 14 + 38 + 18 + 12

80 but less than 90  0 100 = 12 + 14 + 38 + 18 + 12 + 6


Percentage Polygon

When you construct polygons or histograms, the vertical Y-axis should show the true zero, or the
“origin,” so as not to distort the character of the given data.
The horizontal X-axis does not need to show the zero point for the variable of interest, although the
range of the variable should include the major portion of the axis.
Time Series Plot

Year Combined Gross Y-o-y growth


1996 5,669.20 -
1997 6,393.90 12%
1998 6,523.00 5%
1999 7,317.50 10%
2000 7,659.50 5%
2001 8,077.80 7%
2002 9,146.10 13%
2003 9,043.20 (2%)
2004 9,359.40 3%
2005 8,817.10 (5%)
2006 9,231.80 4%
2007 9,685.70 3%
2008 9,707.40 1%
2009 10,675.60 10%
Cumulative Percentage Polygon (Ogive)
The cumulative percentage polygon, or ogive, uses the cumulative percentage distribution to display the variable of interest
along the X - axis and the cumulative percentages along the Y-axis.
Summary
 There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and accomplish different
objectives.

 Data can be obtained through primary source or secondary source according to need,
situation, convenience, time, resources and availability. The most important method for
primary data collection is through questionnaire. Data must be objective and fact-based so that
it helps a decision-maker to arrive at a better decision.

 Statistical data is a set of facts expressed in quantitative form. Data is collected


through various methods. Sometimes our data set consists of the entire population we are
interested in. In other situations, data may constitute a sample from some population.

Cont….
 Type of research, its purpose, conditions under which the data are obtained will determine
the method of collecting the data. If relatively few items of information are required quickly,
and funds are limited telephonic interviews are recommended. If respondents are industrial
clients Internet could also be used. If depth interviews and probing techniques are to be used, it
is necessary to employ investigators to collect data.

 The quality of information collected through the filling of a questionnaire depends, to a


large extent, upon the drafting of its questions. Hence, it is extremely important that the
questions be designed or drafted very carefully and in a tactful manner.

 Before any processing of the data, editing and coding of data is necessary to ensure the
correctness of data. In any research studies, the voluminous data can be handled only after
classification. Data can be presented through tables and charts.

Cont….
 Classification refers to the grouping of data into homogeneous classes and categories. It
is the process of arranging things in groups or classes according to their resemblances and
affinities.

 A frequency distribution is the principle tabular summary of either discrete data or


continuous data. The frequency distribution may show actual, relative or cumulative
frequencies. Actual and relative frequencies may be charted as either histogram (a bar
chart) or a frequency polygon. Two commonly used graphs of cumulative frequencies are less
than ogive or more than ogive.

 Once the raw data is collected, it needs to be summarized and presented to the
decision-maker in a form that is easy to comprehend. Tabulation not only condenses the data,
but also makes it easy to understand. Tabulation is the fastest way to extract information from
the mass of data and hence popular even among those not exposed to the statistical method.

Cont….
 The charts help in grasping the data and analyze it qualitatively. This also helps managers
to effectively present the data as a part of reports. Various types of chart are bar diagram,
multiple bar diagrams, component bar diagram, deviation bar diagram, sliding bar diagram,
Histogram and Pie charts.

 A graphic presentation is another way of representing the statistical data in a simple and
intelligible form. There are two types of graphs which we have discussed, line graphs and
ogives.

You might also like