You are on page 1of 33

UNIT-3

BASIC CHARTS
MULTIDIMENSIONAL VISUALIZATION

PRESENTED BY:

G.IYSWARYA
20PMB057
II-MBA- B SECTION
DATA VISUALIZATION

Data visualization is the visual presentation of data or information.


The goal of data visualization is to communicate data or
information clearly and effectively to readers. Typically, data is
visualized in the form of a chart, infographic, diagram or map.

FEATURES
● Identify areas that need attention or improvement.
● Clarify which factors influence customer behavior.
● Decision-making Ability.
● Integration Capability.
● Predict sales volumes.
BASIC CHARTS
A chart is a graphical representation for data visualization, in which "the data is represented by symbols,
such as bars in a bar chart, lines in a line chart, or slices in a pie chart".

The three most effective basic plots are:

❏ Bar charts
❏ Line graphs
❏ Scatter plots

These plots are easy to create in R and are the plots most commonly used in the current business world, in
both data exploration and presentation.

Basic charts support data exploration by displaying one or two columns of data (variables) at a time.

This is useful in the early stages of getting familiar with the data structure, the amount and types of
variables, the volume and type of missing values, etc.
Bar chart
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with
heights or lengths proportional to the values that they represent. The bars can be plotted vertically or
horizontally. A vertical bar chart is sometimes called a column chart.
WHEN TO USE:
To show a distribution of data points or perform a comparison of metric values across different subgroups of
your data.
APPLICATIONS :
● Analysts use bar charts in technical analysis to help monitor price trends, volatility, security movement
indicators, and more.
● Bar charts in technical analysis are also referred to as open, high, low, and closing (OHLC).
They are helpful in spotting trends, monitoring stock prices, and helping trading analysts make decisions.
EXAMPLE:

The bar chart visually


shows the opening,
high, low, and closing
prices of a security over
a given period.
EXAMPLE: (continued)
● This image shows how to read a bar chart. The vertical
line indicates the high and low prices of the period. The
horizontal line the left shows the market opening price,
and the horizontal line to the right shows the market
closing price.
● It also shows the volatility of the security or asset over a
given period which is the change between the period
high and the period low.
● If the period high and the period low are close together,
the security would be considered relatively nonvolatile.
Conversely, if the price difference between the period
high and low is large, the security would be regarded as
volatile.
● When comparing tens or hundreds of bar charts from
period to period, analysts often find it helpful to use
color-coding in order to understand the graph they are
looking at more quickly.
Line graph
A line graph — also known as a line plot or a line chart — is a graph that uses lines to connect
individual data points. A line graph displays quantitative values over a specified time interval.
WHEN TO USE?
● To track variations over time, which may be long-term or short-term.
● To compare changes over the same period for more than one group..
APPLICATIONS:
Line graphs are used to track changes over short and long periods of time.
When smaller changes exist, line graphs are better to use than bar graphs. Line graphs can also be
used to compare changes over the same period of time for more than one group.
EXAMPLE:
When you have multiple metrics,
compare their lines to determine
whether they have the same trend
and patterns. Comparing the
metrics in this manner helps you
understand their differences and
similarities.
SCATTER PLOTS
Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents data
points on a two-dimensional plane or on a Cartesian system. The independent variable or attribute is plotted on the
X-axis, while the dependent variable is plotted on the Y-axis. These plots are often called scatter graphs or scatter
diagrams
WHEN TO USE:
● When we have paired numerical data
● When there are multiple values of the dependent variable for a unique value of an independent variable
● In determining the relationship between variables in some scenarios, such as identifying potential root causes
of problems, checking whether two products that appear to be related both occur with the exact cause and so
on.

APPLICATIONS:

● Demonstration of the relationship between two variables


● Identification of correlational relationships
● Identification of data patterns
EXAMPLE:

Days of the week


and the sales
MULTIDIMENSIONAL VISUALIZATION
Multidimensional data visualization represents one dimension as a point, two dimensions as a two-
dimensional object or graph, three dimensions as a three-dimensional object or graph, and four or
more dimensions as a movie, or a series of three-dimensional objects of graphs.

● Adding Variables: Color, Size, Shape, Multiple Panels, and Animation


● Manipulations: Rescaling, Aggregation and Hierarchies, Zooming, Filtering
● Reference: Trend Lines and Labels
● Scaling up to Large Datasets
● Multivariate Plot: Parallel Coordinates Plot
● Interactive Visualization
Types of data visualization
● Bubble chart
● Parallel coordinates
● TreeMap
● Diagram
● Chart
● Bar chart
● Pie chart
● Scatter plot
● Line Chart
● Area chart
● Box plot
One of the Type: TREE MAP
The treemap takes the hierarchical data of the tree
A treemap is a way to graphically display the data from a
and breaks it into a map of rectangles. Each
tree diagram. rectangle in the map represents a block in the tree.
Adding Variables: Color, Size, Shape,
Multiple Panels, and Animation
COLOR AND SIZE:
In the context of prediction, color-coding supports the exploration of the conditional relationship between the
numerical outcome (on the y-axis) and a numerical predictor.

Color-coding helps to show what type of node it is.

The Treemap visualization is used to display hierarchical data as a set of nested rectangles. Rectangles of each level
are of different sizes and colors.

Each characteristic of the building tiles (rectangles) has a role in the data analysis:

● Color - shows the categories by which the treemap visualization is divided. Fields data, used to define this
characteristic, can be numerical (123), string (ABC), or date.
● Size - shows the value for each category. Fields data, used to determine size, can only be numerical (123).
SHAPES AND MULTIPLE PANEL
When the number of categories is large, a better alternative is to use multiple panels.

Creating multiple panels (also called “trellising”) is done by splitting the observations according to a
categorical variable, and creating a separate plot (of the same type) for each category.

Figure 21.1: Breakdown of passengers on the Titanic by gender,


survival, and class in which they traveled (1st, 2nd, or 3rd).
ANIMATION
The primary goal of animating data is to improve communication and drive change.

Animation can bring data to life during both the visual exploration and storytelling phases. By animating data visualization,
one can engage viewers in many ways. Static data visualization, especially the unique ones, can be persuasive and aid
decision-making. Animating them will boost these advantages even further.

Adding a temporal dimension to a plot to show how the information changes over time can be achieved via animation.
Manipulations: Rescaling, Aggregation
and Hierarchies, Zooming, Filtering
RESCALING
The process of changing the scale or proportions of something.This operation applies
to quantitative graphs in particular.

All graphs have at least one quantitative scale along an axis. Ordinarily, the quantitative
scale places equal space between equal intervals of value. This common type of scale is
sometimes called a linear scale
The graph below displays two sets of sales When we wish to compare rates of change
and avoid this optical suggestion that rate
values: one for hardware sales and one for of change for higher-priced items is greater
software sales. Which is increasing at a than for lower-priced items, log scales are a
faster rate: hardware or software sales? convenient solution. Look at what happens
when I do nothing to the graph above but
change it from a linear to a log scale.
AGGREGATIONS AND HIERARCHIES
Another useful manipulation of scaling is changing the level of aggregation.
Examining different aggregations, or hierarchies supports both supervised and unsupervised tasks in that
it can reveal patterns and relationships at various levels, and can suggest new sets of variables with
which to work.
For a temporal scale, we can aggregate by different granularity (e.g., monthly, daily, hourly) or even by a
“seasonal” factor of interest such as month-of-year or day-of-week. A popular aggregation for time series
is a moving average, where the average of neighboring values within a given window size is plotted.
Moving average plots enhance visualizing a global trend.
Non-temporal variables can be aggregated if some meaningful hierarchy exists: geographical (tracts
within a zip code in the Boston Housing example), organizational (people within departments within
units), etc
AGGREGATION EXAMPLE

The original monthly series is shown


in the top-left panel. Seasonal
aggregation (by month-of-year) is
shown in the top-right panel, where it
is easy to see the peak in ridership in
July–August and the dip in January–
February.The bottom-right panel
shows temporal aggregation, where
the series is now displayed in yearly
aggregates. This plot reveals the
global long-term trend in ridership
and the generally increasing trend
from 1996 on.
ZOOMING AND FILTERING
The ability to zoom in and out of certain areas of the data on a plot
is important for revealing patterns and outliers.

Zooming support supervised and unsupervised methods by


detecting areas of different behavior, which may lead to creating
new interaction terms, new variables, or even separate models for
data subsets.

Filtering means removing some of the observations from the plot.


The purpose of filtering is to focus the attention on certain data
while eliminating “noise” created by other data. Filtering supports
supervised and unsupervised learning, it assists in identifying
different or unusual local behavior.
Reference: Trend Lines and Labels
TREND LINES AND LABELS
Trendlines, also known as lines of best fit or regression lines, graphically illustrate trends in data series and are
commonly used when charting predictions. A trendline is typically a line or curve that connects or passes through
two or more points in the series, showing a trend.

Trend lines and using in-plot labels also help to detect patterns and outliers. Trend lines serve as a reference, and
allow us to more easily assess the shape of a pattern. Although linearity is easy to visually perceive, more elaborate
relationships such as exponential and polynomial trends are harder to assess by eye. Trend lines are useful in line
graphs as well as in scatter plots.

The use of in-plot labels can be useful for better exploration of outliers and clusters. Labels make it easier for
users to understand data visualizations by using text to reinforce visual concepts.

Labels are traditionally used to label axes and legends, however, they can also be used inside of data visualizations
to communicate categories, values, or annotations.
EXAMPLE: DATA LABELS
Scaling up to Large Datasets
SCALING UP-TO LARGE DATABASE
When the number of observations (rows) is large, plots that display each individual
observation (e.g., scatter plots) can become ineffective.

Aside from using aggregated charts such as boxplots, some alternatives are:

1. Sampling—drawing a random sample and using it for plotting


2. Reducing marker size
3. Using more transparent marker colors and removing fill
4. Breaking down the data into subsets (e.g., by creating multiple panels)
5. Using aggregation (e.g., bubble plots where size corresponds to number
of observations in a certain range)
6. Using jittering (slightly moving each marker by adding a small amount
of noise)
Multivariate Plot: Parallel Coordinates Plot
PARALLEL COORDINATES PLOT
This type of visualisation is used for plotting multivariate,
numerical data. Parallel Coordinates Plots are ideal for comparing
many variables together and seeing the relationships between
them.

In a Parallel Coordinates Plot, each variable is given its own axis


and all the axes are placed in parallel to each other. Each axis can
have a different scale, as each variable works off a different unit
of measurement, or all the axes can be normalised to keep all the
scales uniform.

Parallel coordinates plots are also useful in unsupervised tasks.


They can reveal clusters, outliers, and information overlap across
variables. A useful manipulation is to reorder the columns to
better reveal observation clusterings.
Interactive Visualizations
INTERACTIVE VISUALIZATIONS
Similar to the interactive nature of the data mining process, interactivity is key to enhancing our
ability to gain information from graphical visualization.

By interactive visualization, we mean an interface that supports the following Principles:

1. Making changes to a chart is easy, rapid, and reversible.

2.Multiple concurrent charts and tables can be easily combined and displayed on a single screen.

3.A set of visualizations can be linked, so that operations in one display are reflected in the other
displays.

You might also like