0% found this document useful (0 votes)
38 views39 pages

Data Visulization1

The document provides an overview of data visualization, explaining its purpose, advantages, and various techniques such as exploratory data analysis (EDA) and visual encoding. It details different types of visualizations including histograms, scatter plots, bar charts, box plots, and pie charts, along with their respective functions in R programming. Additionally, it highlights how visualization aids in understanding complex data, enhances communication, and improves analytical efficiency.

Uploaded by

ishwariborkar18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views39 pages

Data Visulization1

The document provides an overview of data visualization, explaining its purpose, advantages, and various techniques such as exploratory data analysis (EDA) and visual encoding. It details different types of visualizations including histograms, scatter plots, bar charts, box plots, and pie charts, along with their respective functions in R programming. Additionally, it highlights how visualization aids in understanding complex data, enhances communication, and improves analytical efficiency.

Uploaded by

ishwariborkar18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

• DATA VISUALIZATION

• Visualization :
• process of that transforms the representation of real raw data into
meaningful information/insights in a visual representation.
• Data Visualization:
• -It is graphical representation of information and a data.
• -Mapping between original data(numeric data) and graphic elements
(lines, pointers)
• -Visual elements(charts, graphs, maps)
Advantages of Visualization:
1.Simplifies Complex Data: make it easier to interpret and understand
complex datasets by presenting information in a clear and concise
manner.
2.Enhances Communication: They improve communication by allowing
information to be shared quickly and effectively across diverse audiences.
3.Engages Audience: Visual elements are more engaging and can capture
the attention of the audience better than text or tables alone.
4.Analytical Efficiency: Visual tools can help analysts quickly identify key
insights, reducing the time needed to analyze data.
5.Quick Insights: All for faster identification of patterns, trends, outliers.
6.Error detection: Makes it easier to spot error in the data that affect
analysis.
7.Increases Productivity: Reduces time spent on data analysis &
interpretation by making data insights more immediately apparent.
Introduction to Exploratory Data Analysis:
• -process of examining or understanding the data & extracting insights
of the data
• -Process of investigating the dataset to discover patterns, and
anomalies and form hypotheses based on understanding the dataset.
• -EDA involves generating summary statistic for numerical data in the
dataset and creating various graphical representation to understand
the data easy and better.
• -EDA refers to critical process of performing initial investigation on
data so as to discover patterns & check assumptions with the help of
summary statistic & graphical representation.
EDA involves a combination foll. methods:
•Univariate Visualization of and summary statistic for each field in the
raw dataset.
•Bivariate visualization & summary statistic for accessing the
relationship between variable in the dataset and target variable of
interest.
•Multivariate visualizations to understand interactions between
different fields in the data.
•Dimensionality Reduction to understand the fields in the data that
accounts for the most variance between observations and allow for
processing of reduced volume data.
•Clustering of similar to observations in the dataset into differentiated
groupings, which by collapsing the data into a few small data points,
patterns of behavior can be more easily identified.
• DATA VISUALIZATION & VISUAL CODING:
• -Data visualization has the power of illustrating complex data
relationship and patterns with the help of simple designs consisting of
lines, shapes and colors.
• -Visual Encoding is used to map data into visual structures, there by
building an image on the screen.
• Data Visualization can help in:
1.Identify Outliers in Data: Data visualization makes it easy to sport outliers
those data points that look different from the rest.
ex. In chart, an outliers might be a dot that is far away from the other dots,
helps to see quickly.
2. Enhanced Collaboration: Advanced visualization tools make it easier for
teams to collaboratively go through the reports for instant decision making.
3. Business Analysis Made easy: It deals with various sales prediction,
product promotion, customer behavior through the use of correct data
visualization techniques.
4.Improve Response Time
5.Greater Simplicity
6. Easier Visualization of Patterns.
• VISUAL ENCODING:
• -translating the data into a visual element on a chart or map through
position, shape, size, symbol and color.
• -It is way in which data is mapped into visual structure, upon which
we build the images on a screen.
• What is the visualization graph supposed to display?

• Distribution
• Relationship
• Comparison
• Connection
• Composition
• Location
Distribution Visualizations
• These show how data is spread (range, shape, center, and variability).
Type Description R Function/Package

Shows frequency distribution of a


Histogram hist(), geom_histogram()
variable

Density Plot Smoothed version of a histogram density(), geom_density()

Boxplot Shows median, quartiles, outliers boxplot(), geom_boxplot()


• Histogram:
• -Graphical display of data using bars of different heights.
• -Shows accurate representation of the distribution of numeric data.
• -Histogram uses a ‘bin’ for a set or range of values to be distributed.
• -To make histogram w we can use plt.hist() function.
• -First argument is the numeric data & second argument is number of
bins.
• (default value of bin is 10)
• syntax
• hist(v, main, xlab, xlim, ylim, breaks, col, border)
• v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39)
• # Create the histogram.
• hist(v, xlab = "No.of Articles", col = "green",
• border = "black", xlim = c(0, 50),
• ylim = c(0, 5), breaks = seq(0, 50, by = 10))
scatter plot
• A scatter plot is a set of dotted points representing individual data
pieces on the horizontal and vertical axis.
• In a graph in which the values of two variables are plotted along the X-
axis and Y-axis, the pattern of the resulting points reveals a correlation
between them.
• We can create a scatter plot in R Programming Language using
the plot() function.
• Syntax:
• plot(x, y, main, xlab, ylab, xlim, ylim, axes)
• x <- c(1, 2, 3, 4, 5)
• y <- c(2, 4, 6, 8, 10)
• plot(x, y,
• main = "Custom Scatter Plot",
• xlab = "X Axis Label",
• ylab = "Y Axis Label",
• xlim = c(0, 6),
• ylim = c(0, 12),
• axes = TRUE)
//////////////////////////////////////////
pch = 19: Uses solid circle characters (bullet points) for each data
col = "blue": Colors the points blue
Pch->plotting character
pch Value Symbol Description
1 ○ Open circle
2 △ Open triangle
3 + Plus sign
4 × Cross
5 ◻ Open square
6 ◇ Open diamond
15 ■ Filled square
16 ● Filled circle
17 ▲ Filled triangle
18 ◆ Filled diamond
Solid circle (most common for
19 ●
points)
20 • Smaller filled circle
• # Step 1: Create sample student data
• student_data <- data.frame(
• math = c(78, 85, 92, 70, 88, 76, 95, 67, 80, 90),
• science = c(75, 82, 89, 72, 90, 78, 94, 65, 83, 91),
• gender = c("Male", "Female", "Female", "Male", "Female",
• "Male", "Male", "Female", "Female", "Male"))
• # Step 2: Assign colors and point shapes based on gender
• colors <- ifelse(student_data$gender == "Male", "blue", "red")
• shapes <- ifelse(student_data$gender == "Male", 19, 17)
• # Step 3: Create scatter plot
• plot(student_data$math, student_data$science,
• main = "Math vs Science Scores by Gender",
• xlab = "Math Score",
• ylab = "Science Score",
• col = colors,
• pch = shapes,axes=TRUE)
Bar Chart

• A bar chart is a graphical display of data using bars of different heights


(or lengths).
• It’s mainly used to show counts or summaries of categorical data (like
fruits, gender, brands).
• Each bar represents a category, and the height shows the value
(count, sum, etc.) for that category.
• Syntax:
barplot(height, names.arg, col, main, xlab, ylab)

• ggplot(data, aes(x, y)) + geom_bar(stat = "identity")


library("ggplot2")
df <- data.frame(Category = c("A", "B", "C"), Value = c(10, 20,
15))
ggplot(df, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity")+
scale_fill_manual(values = c("A" = "red", "B" = "green", "C" =
"blue"))
dent_counts <- c(12, 18, 7)
class_names <- c("Class A", "Class B", "Class C")
# Step 2: Plot bar chart
barplot(student_counts,
names.arg = class_names, # Add class names directly
col = "orange", # Bar color
main = "Number of Students in Each Class",
xlab = "Class",
ylab = "Number of Students",
ylim = c(0, max(student_counts) + 5), # Extra space at top
border = "black") # Optional: border around bars
# Step 3: Add class names on x-axis
axis(1, at = 1:length(class_names), labels = class_names)
months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun")
product_A <- c(150, 200, 180, 220, 210, 230)
product_B <- c(120, 160, 190, 210, 200, 220)
# Combine into a matrix (products as rows, months as columns)
sales_data <- rbind(Product_A = product_A, Product_B = product_B)
barplot(sales_data,
beside = TRUE, # grouped bars
col = c("skyblue", "orange"),
names.arg = months,
main = "Monthly Sales Comparison",
xlab = "Month",
ylab = "Sales Units",
ylim = c(0, max(sales_data) + 50))
# Add legend
legend("topright",
legend = c("Product A", "Product B"),
• To represent data that involves 3 or more variables, these retinal variables
play a major role. For example:
• 1.Shape: circle, oval, diamond, rectangle may signify different types of data &
is easily recognized by the eye for the distinguished look.
• 2.Size: used for quantitative data as smaller size indicates less values while
bigeerindicates more value.
• 3.Color: satuarationdecides intensity of color and can be used to differentiate
visual elements from their surroundings by displaying diff. scales of value.
• 4.Orientation: (vertical, horizontal, slanted) help in signifying data trends such
as upward trend or downward trend.
• 5.Texture: show differentiation among data and is mainly used for data
comparison.
• 6.Angles: provides a sense of proportion and this characteristics can help
Data Science Fundamentals & Practical Approaches analyst or data scientist
make better data comparison.
Box plot
• A box plot is a good way to show many important features of
quantitative (numerical) data.
• It shows the median of the data. This is the middle value of the data and
one type of an average value.
• It also shows the range and the quartiles of the data. This tells us
something about how spread out the data is.
Syntax
• boxplot(x,
main = "Title",
xlab = "X-axis label",
ylab = "Y-axis label“,
col = "color")
• The median is the red line through the middle of the 'box'. We can see
that this is just above the number 60 on the number line below. So the
middle value of age is 60 years.
• The left side of the box is the 1st quartile. This is the value that separates
the first quarter, or 25% of the data, from the rest. Here, this is 51 years.
• The right side of the box is the 3rd quartile. This is the value that
separates the first three quarters, or 75% of the data, from the rest.
Here, this is 69 years.
• The distance between the sides of the box is called the inter-quartile
range (IQR). This tells us where the 'middle half' of the values are. Here,
half of the winners were between 51 and 69 years.
• The ends of the lines from the box at the left and the right are the
minimum and maximum values in the data. The distance between these
is called the range.
• data <- c(5, 7, 8, 6, 9, 12, 15, 10, 7, 8)

• # Step 2: Create boxplot


• boxplot(data,
• main = "Simple Boxplot",
• ylab = "Values",
• col = "lightblue")
• scores <- c(78, 85, 67, 90, 82, 74, 88, 79, 69, 91, 86, 71, 80, 83, 77)
classes <- c("Class A", "Class A", "Class A",
"Class B", "Class B", "Class B",
"Class C", "Class C", "Class C",
"Class A", "Class B", "Class C",
"Class A", "Class B", "Class C")
• # Create a data frame
data <- data.frame(score = scores, class = classes)
# Create boxplots of scores by class with colors and axis titles
boxplot(score ~ class, data = data, //plot y axis value group by class X
col = c("lightblue", "lightgreen", "lightpink"),
main = "Distribution of Test Scores by Class",
xlab = "Class",
ylab = "Test Scores")
pie chart
• A pie chart is a circular statistical graphic, which is divided into slices to
illustrate numerical proportions.
• It depicts a special chart that uses "pie slices", where each sector shows
the relative sizes of data.
• A circular chart cuts in the form of radius into segments describing relative
frequencies or magnitude also known as a circle graph.
• the function pie() to create pie charts. It takes positive numbers as a
vector input.
• Syntax:
pie(x, labels, radius, main, col, clockwise)
data<- c(23, 56, 20, 63)
labels <- c("Mumbai", "Pune", "Chennai", "Bangalore")
pie(label, labels)
values <- c(25, 30, 20, 25)
labels <- c("Q1", "Q2", "Q3", "Q4")
colors <- c("red", "blue", "green", "yellow")
# Pie Chart
pie(
x = values,
labels = labels, radius = 1,
main = "Quarterly Sales Distribution",
col = colors,
clockwise = TRUE
)
data<- c(23, 56, 20, 63)
labels <- c("Mumbai", "Pune", "Chennai", "Bangalore")
pie(data, labels, main = "City pie chart",
col = rainbow(length(data)))
• # Sample data
data <- c(45, 25, 15, 10, 5)
browsers <- c("Chrome", "Safari", "Firefox", "Edge", "Other")
labels <- paste0(data, " (", market_share, "%)")
# Pie chart using all arguments
pie(x = data,
labels = labels,
radius = 1,
main = "Browser Market Share (2025)",
col = rainbow(length(market_share)),
clockwise = TRUE)
• install. Packages("plotrix")
library(plotrix)
brands <- c("Brand A", "Brand B", "Brand C", "Brand D", "Brand E")
market_share <- c(30, 25, 20, 15, 10)
# Create labels with brand and percentage
labels <- paste0(brands, " (", market_share, "%)")
# Create 3D pie chart
pie3D(market_share,
labels = labels,
explode = 0.1, # separates slices slightly
main = "Smartphone Market Share",
col = rainbow(length(market_share)),
labelcex = 0.8)
Library Purpose Description Common Functions Explanation

Best for building complex, layered


ggplot2 Custom static Grammar-of-graphics-based plotting ggplot(), geom_bar(), visualizations (e.g., scatter + regression line)
plots system, part of tidyverse. geom_point() using a consistent syntax. Ideal for
publication-quality plots.

Hover, zoom, and dynamic charts in just a


Interactive Builds interactive versions of static
plotly visualizations plots; integrates with ggplot2. plot_ly(), ggplotly() few lines. Great for dashboards and data
exploration.

Built-in support for grouped data and


Multivariate Designed for plotting data multiple panels (e.g., plot by gender, region).
lattice conditioned on one or more xyplot(), bwplot()
data plots variables (panel plots). Less flexible than ggplot2 but more concise
for some tasks.

Great for quick visual checks and learning.


base R Quick and Comes built into R; no need for
plotting simple plots additional libraries. plot(), hist(), boxplot() Not as polished or customizable, but very
fast.
Key Function for Bar
Library Purpose Chart Type Support Best For
Chart

Static, layered geom_bar(), Bar, Line, Pie, Scatter, Most widely used for
ggplot2
visualizations geom_col() etc. custom, quality plots

Web, dashboard,
plotly Interactive charts plot_ly(type = "bar") Bar, Line, Scatter, etc.
hover/zoom support

Bar, Histogram, Line,


base R Basic, built-in plotting barplot() Quick, simple visuals
Boxplot

Interactive charts hchart(type = Column/Bar, Line, Pie, Business dashboards,


highcharter
(business) "column") Area finance apps

Interactive
Animated & web- Bar, Pie, Timeline,
echarts4r e_bar() dashboards,
friendly plots Map
storytelling

You might also like