Professional Documents
Culture Documents
• Define the variables that you want to study in order to solve a business problem or meet a business
objective
• Collect the data from appropriate sources
• Organize the data collected by developing tables
• Visualize the data by developing charts
• Analyze the data by examining the appropriate tables and charts, and other statistical methods to reach
conclusions
Classification of statistics
APPLIED
Data Visualization
Learning Objectives
Understand Bar diagram, Histogram, Pie Diagram, Frequency polygons and Ogives
To understand God’s thoughts…
To understand God’s thoughts…
Florence Nightingale
The Lady with the Lamp and the Coxcomb
Activity: Video-based Discussion
Guidelines
• Watch the TED Talk video of Late
Dr. Hans Rosling, the renowned
Swedish physician, academic,
statistician, and public speaker
• What is the data story about?
• What do you notice about how data
has been presented?
• Discuss the learnings and
implications
Variable Types
Categorical Variables - These are also known as qualitative variables, but it can take a quantitative value if
coded, e.g. Male coded as 1, and Female as 2 in categorizing gender
Numerical Variables - These are also known as quantitative variables
Discrete Variables – These are variables that take only specific values, e.g. number of cars in a parking lot,
however, they need not be whole numbers. E.g. shoe sizes can be in decimals but not any value.
Continuous Variables - These are variables that take can take any value, and therefore, can be plotted along an
almost unbroken continuum, e.g. variables like height, length, weight, etc. These are always numerical variables
• Also known as bar • The value of an item is • With the help of three • These are like frequency plots.
diagrams, and the represented by an area. dimensional diagrams, the The data points are plotted on
magnitude of the Such diagrams are also values of various items are the graph in the same manner.
characteristics is shown known as ‘surface’ or ‘area represented by the volume of Then instead of joining the data
by the length or height diagrams’ cube, sphere, cylinder, etc. points, pictures or objects of the
of the bar • Popular forms include These diagrams are normally height of the data points are
• The width depends upon rectangular, square or used when the variations in used to depict the data
the number of bars to be circular (e.g. pie chart) the magnitudes of • Heights of the pictures or objects
accommodated in the ones observations are very large represent the frequency. These
diagrams include histograms and
frequency polygon
One-dimensional Diagrams: Bar Chart
One-dimensional Diagrams: Scatter Plot
Scatter Plot
• Scatter diagram is the most
fundamental graph plotted to show
relationship between two variables.
It is a simple way to represent
bivariate distribution
• Bivariate distribution is the
distribution of two random
variables. Two variables are plotted
one against each of the X and Y axis
• Scatter diagram thus, indicates
nature and strength of the
correlation.
Two-dimensional Diagrams: Pie Chart
Banking Preference
Banking %
Preference? ATM
ATM 16
% 16% 2% Automated or live
telephone
24%
Automated or live 2% Drive-through service at
telephone 17% branch
service at branch %
Internet
In person at branch 41
%
Internet 24
%
Histogram
• A vertical bar chart of the data in a
frequency distribution is called a
histogram
• In a histogram there are no gaps
between adjacent bars, as it
represents continuous data
• The class boundaries (or class
midpoints) are shown on the
horizontal axis
• The vertical axis is either frequency,
relative frequency, or percentage.
• The height of the bars represent the
frequency, relative frequency, or
percentage
Frequency Distribution (1/2)
Classes are the groups that represent a range of values, called a class
interval. Each value can be in only one class and every value must be contained
in one of the classes.
To create a useful frequency distribution, you must think about how many classes
are appropriate for your data and also determine a suitable width for each class
interval.
Frequency Distributions of the Cost per Meal for 50 City Restaurants and 50 Suburban
Restaurants
Total 50 50
Relative Frequency and Percentage Distribution
When you are comparing two or more groups, as is done previously, knowing the proportion
or the percentage of the total that is in each group, is more useful than knowing the
frequency count of each group. For such situations, you create a relative frequency
distribution or a percentage distribution instead of a frequency distribution.
CITY SUBURBAN
ST PER MEAL ($)
Relative Relative
Percentage (%) Percentage (%)
Frequency Frequency
20 but less than 30 0.12 12.0 0.10 10.0
30 but less than 40 0.14 14.0 0.34 34.0
40 but less than 50 0.38 38.0 0.34 34.0
50 but less than 60 0.18 18.0 0.14 14.0
60 but less than 70 0.12 12.0 0.08 8.0
70 but less than 80 0.06 6.0 0.00 0.0
Total 1.00 100.0 1.00 100.0
Percentage Polygon
When you construct polygons or histograms, the vertical Y-axis should show the true
zero, or the “origin,” so as not to distort the character of the given data.
The horizontal X-axis does not need to show the zero point for the variable of interest,
although the range of the variable should include the major portion of the axis.
Cumulative Frequency Distribution
The Cumulative Percentage Distribution provides a way of presenting information about the percentage
of values that are less than a specific amount.
For example, to know what percentage of the city restaurant meals cost less than $40 or what percentage
cost less than $50, you use the percentage distribution to form the cumulative percentage distribution.
Developing the Cumulative Percentage Distribution for the Cost of Meals at City Restaurants
Table is a spreadsheet
with rows and columns
with headings and stubs
indicating class of the
Multi –
Two – Way
data. Way
Tabulation
Tabulation
1– 23
Data Tabulation
• Statistical tables can be classified into various categories depending upon the basis of their
classification. Broadly speaking, the basis of classification can be any of the following:
o Purpose of investigation
o Nature of presented figures
o Construction
Data Tabulation: on the basis of purpose (1/2)
Primary table
• Primary table is also known as original table and
it contains data in the form in which it were
originally collected
Derivative table
• A table which presents figures like totals,
averages, percentages, ratios, coefficients, etc.,
derived from original data. A table of time series
data is an original table but a table of trend
values computed from the time series data is
known as a derivative table.
Data Tabulation: on the basis of construction (1/2)
3-way
2-way
Cross-classified table
• Tables that classify entries in both
directions, i.e., row-wise and column-wise,
are called cross-classified tables. The two
ways of classification are such that each
category of one classification can occur
with any category of the other. The cross-
classified tables can also be constructed
for more than two characteristics also. A
cross-classification can also be used for
analytical purpose, e.g., it is possible to
make certain comparisons while keeping
the effect of other factors as constant.
Summary
There are two major divisions of the field of statistics, namely descriptive and
inferential statistics. Both the segments of statistics are important, and
accomplish different objectives.
Before any processing of the data, editing and coding of data is necessary to
ensure the correctness of data. In any research studies, the voluminous data
can be handled only after classification. Data can be presented through tables and
charts.
Cont….
Classification refers to the grouping of data into homogeneous classes and
categories. It is the process of arranging things in groups or classes according
to their resemblances and affinities.
Cont….
The charts help in grasping the data and analyze it qualitatively. This also
helps managers to effectively present the data as a part of reports. Various types of
chart are bar diagram, multiple bar diagrams, component bar diagram, deviation
bar diagram, sliding bar diagram, Histogram and Pie charts.