Professional Documents
Culture Documents
Data Visualization
Learning Objectives
Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives
Session Outline
33 Presenting Data
Quick recap…
Remember DCOVA? The decision-making life cycle in statistics
• Define the variables that you want to study in order to solve a business problem or meet a business objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts, and other statistical methods to reach conclusions
Activity 1: Understanding Data and Variables
Guidelines
• Refer to the data sheet, ‘Grad Survey’
• Understand the information presented in
the sheet and try to categorize it into
Mutually Exclusive, Collectively
Exhaustive (MECE) buckets
• Discuss the differences between various
data fields and what purpose do they
serve
• How will you proceed with analysing
this data?
Variable Types (1/2)
Categorical Variables - These are also known as qualitative variables.
Numerical Variables - These are also known as quantitative variables.
Discrete Variables – These are variables that take only specific values and could be numerical or
categorical
Continuous Variables - These are variables that define a continuum and are numerical
Question Responses Data Type
Do you currently
have a profile on Yes/No Categorical
Facebook?
How many text
messages have you Numerical
______
sent in the past (discrete)
week?
How long did it take
Numerical
to download a video _____ seconds
(continuous)
game?
Variable Types (2/2)
Discussion question: What is the implication for Descriptive and Inferential statistics?
• Used to “name” (and so • Is used to “order” data • Possesses all the • Possesses all the characteristics
Nominal) or label a set (and so Ordinal) in a characteristics of ordinal data of interval data
of values certain sequence • In addition, the difference • In addition, there is a true zero,
• Can be qualitative or • Magnitude of difference between intervals is known whose value remains universal
quantitative (label between two values not and is uniform • Kelvin scale, altitude, height are
denoting categories) known • However, there is no true zero some examples
• Examples can include • Typically measure abstract • Celsius and Fahrenheit scale
gender, income constructs like is a good example to consider
categories, city type satisfaction, loyalty etc.
Session Outline
33 Presenting Data
Collection of Data
Designing Questionnaire
Editing and Coding of Data
Coding of Data
Editing Primary Data
Completeness
Coding is the process of
Consistency
assigning some symbols
Accuracy either alphabetical or numeral or
Central Editing
Classification of Data
Classification refers to the
1 2
grouping of data into
33 Presenting Data
Activity 2: Video-based Discussion
Guidelines
• Watch the TED Talk video of Late Dr.
Hans Rosling, the renowned Swedish
physician, academic, statistician, and
public speaker
• What is the data story about?
• What do you notice about how data has
been presented?
• Discuss the learnings and implications
Tabulation of Data
Types of
Tabulation is arranging the Tabulation
Types of
Tabulation
data in flat table (two
dimensional arrays) format by
grouping the observations. One – Way Advantages
OneTabulation
– Way of Tabulation
Advantages
Table is a spreadsheet with Tabulation of Tabulation
rows and columns with
headings and stubs indicating
class of the data.
1– 14
Data Tabulation
• Statistical tables can be classified into various categories depending upon the basis of their classification. Broadly
speaking, the basis of classification can be any of the following:
o Purpose of investigation
o Nature of presented figures
o Construction
Data Tabulation: on the basis of purpose (1/2)
Primary table
• Primary table is also known as original table and it
contains data in the form in which it were originally
collected
Derivative table
• A table which presents figures like totals, averages,
percentages, ratios, coefficients, etc., derived from
original data. A table of time series data is an original
table but a table of trend values computed from the time
series data is known as a derivative table.
Data Tabulation: on the basis of construction (1/2)
3-way
2-way
Cross-classified table
• Tables that classify entries in both directions,
i.e., row-wise and column-wise, are called cross-
classified tables. The two ways of classification
are such that each category of one classification
can occur with any category of the other. The
cross-classified tables can also be constructed
for more than two characteristics also. A cross-
classification can also be used for analytical
purpose, e.g., it is possible to make certain
comparisons while keeping the effect of other
factors as constant.
Diagrammatical Presentation of Data
• Also known as bar diagrams, • The value of an item is • With the help of three dimensional • These are like frequency plots. The
and the magnitude of the represented by an area. Such diagrams, the values of various data points are plotted on the graph in
characteristics is shown by diagrams are also known as items are represented by the the same manner. Then instead of
the length or height of the ‘surface’ or ‘area diagrams’ volume of cube, sphere, cylinder, joining the data points, pictures or
bar • Popular forms include etc. These diagrams are normally objects of the height of the data
• The width depends upon the rectangular, square or circular used when the variations in the points are used to depict the data
number of bars to be (e.g. pie chart) ones magnitudes of observations are • Heights of the pictures or objects
accommodated in the very large represent the frequency. These
diagrams include histograms and frequency
polygon
One-dimensional Diagrams: Bar Chart
One-dimensional Diagrams: Scatter Plot
Scatter Plot
• Scatter diagram is the most fundamental
graph plotted to show relationship
between two variables. It is a simple way
to represent bivariate distribution
• Bivariate distribution is the distribution of
two random variables. Two variables are
plotted one against each of the X and Y
axis
• Scatter diagram thus, indicates nature and
strength of the correlation.
Two-dimensional Diagrams: Pie Chart
Banking Preference
Banking Preference? %
ATM
ATM 16%
Automated or live 2% 16% 2% Automated or live
telephone 24% telephone
Drive-through service at
Drive-through service 17% 17% branch
at branch
41% In person at branch
In person at branch 41%
Internet 24% Internet
Histogram
• A vertical bar chart of the data in a
frequency distribution is called a
histogram
• In a histogram there are no gaps between
adjacent bars, as it represents continuous
data
• The class boundaries (or class midpoints)
are shown on the horizontal axis
• The vertical axis is either frequency,
relative frequency, or percentage.
• The height of the bars represent the
frequency, relative frequency, or
percentage
Frequency Distribution (1/2)
Classes are the groups that represent a range of values, called a class interval. Each value can
be in only one class and every value must be contained in one of the classes.
To create a useful frequency distribution, you must think about how many classes are
appropriate for your data and also determine a suitable width for each class interval.
Frequency Distributions of the Cost per Meal for 50 City Restaurants and 50 Suburban Restaurants
Total 50 50
Relative Frequency and Percentage Distribution
When you are comparing two or more groups, as is done previously, knowing the proportion or the
percentage of the total that is in each group, is more useful than knowing the frequency count of each
group. For such situations, you create a relative frequency distribution or a percentage distribution instead
of a frequency distribution.
CITY SUBURBAN
ST PER MEAL ($) Relative Relative
Percentage (%) Percentage (%)
Frequency Frequency
20 but less than 30 0.12 12.0 0.10 10.0
30 but less than 40 0.14 14.0 0.34 34.0
40 but less than 50 0.38 38.0 0.34 34.0
50 but less than 60 0.18 18.0 0.14 14.0
60 but less than 70 0.12 12.0 0.08 8.0
70 but less than 80 0.06 6.0 0.00 0.0
Total 1.00 100.0 1.00 100.0
Cumulative Distribution
The Cumulative Percentage Distribution provides a way of presenting information about the percentage
of values that are less than a specific amount.
For example, to know what percentage of the city restaurant meals cost less than $40 or what percentage
cost less than $50, you use the percentage distribution to form the cumulative percentage distribution.
Developing the Cumulative Percentage Distribution for the Cost of Meals at City Restaurants
When you construct polygons or histograms, the vertical Y-axis should show the true zero, or the
“origin,” so as not to distort the character of the given data.
The horizontal X-axis does not need to show the zero point for the variable of interest, although the
range of the variable should include the major portion of the axis.
Time Series Plot
Data can be obtained through primary source or secondary source according to need,
situation, convenience, time, resources and availability. The most important method for
primary data collection is through questionnaire. Data must be objective and fact-based so that
it helps a decision-maker to arrive at a better decision.
Cont….
Type of research, its purpose, conditions under which the data are obtained will determine
the method of collecting the data. If relatively few items of information are required quickly,
and funds are limited telephonic interviews are recommended. If respondents are industrial
clients Internet could also be used. If depth interviews and probing techniques are to be used, it
is necessary to employ investigators to collect data.
Before any processing of the data, editing and coding of data is necessary to ensure the
correctness of data. In any research studies, the voluminous data can be handled only after
classification. Data can be presented through tables and charts.
Cont….
Classification refers to the grouping of data into homogeneous classes and categories. It
is the process of arranging things in groups or classes according to their resemblances and
affinities.
Once the raw data is collected, it needs to be summarized and presented to the
decision-maker in a form that is easy to comprehend. Tabulation not only condenses the data,
but also makes it easy to understand. Tabulation is the fastest way to extract information from
the mass of data and hence popular even among those not exposed to the statistical method.
Cont….
The charts help in grasping the data and analyze it qualitatively. This also helps managers
to effectively present the data as a part of reports. Various types of chart are bar diagram,
multiple bar diagrams, component bar diagram, deviation bar diagram, sliding bar diagram,
Histogram and Pie charts.
A graphic presentation is another way of representing the statistical data in a simple and
intelligible form. There are two types of graphs which we have discussed, line graphs and
ogives.