You are on page 1of 34

Graphical representation of data

Visualizing single variable


Plot(data)
• The most used plotting function in R
programming is the plot() function. It is a
generic function, meaning, it has many
methods which are called according to the
type of object passed to plot()
• plot(x, y, ...) Arguments
• xthe coordinates of points in the plot. Alternatively, a single plotting structure,
function or any R object with a plot method can be provided.
• ythe y coordinates of points in the plot, optional if x is an appropriate structure.
• ...Arguments to be passed to methods, such as graphical parameters (see par).
Many methods will accept the following arguments:
• Type what type of plot should be drawn. Possible types are
• "p" for points,
• "l" for lines,
• "b" for both,
• "c" for the lines part alone of "b",
• "o" for both ‘overplotted’,
• "h" for ‘histogram’ like (or ‘high-density’) vertical lines,
• "s" for stair steps,
Dot and bar plot
• Dotchart and barplot portray continuous
values with labels from a discrete variable.
• A dotchart can be created in R with the
function dotchart(x, label=…), where x is a
numeric vector and label is a vector of
categorical labels for x.
• A barplot can be created with the
barplot(height) function, where height
represents a vector or matrix.
Dot...
data(mtcars)

dotchart(mtcars$mpg,labels=row.names(mtca
rs),cex=.7,
• main=“Miles Per Gallon (MPG) of Car Models
• “,
• xlab=“MPG”)
• dotchart(iris$Petal.Length, main="IRIS data")
Dot chart
Bar plot
• barplot(table(mtcars$cyl), main=“Distribution
of Car Cylinder Counts”,xlab=“Number of
Cylinders”)

• Barplot(table(iris$Petal.Length, main="IRIS
data"))
barplot
histogram
• Income<-rlnorm(4000,meanlog=4,sdlog = 0.7)
• hist(income,breaks=500,xlab=“income”,ylab=“
freq”, main=”histogram”)
Density plot
• density plots are usually a much more effective way to view the distribution of a
variable. Create the plot using plot(density(x)) where x is a numeric vector.
• d <- density(mtcars$mpg)
plot(d, main="Kernel Density of Miles Per Gallon")

• a<-c(10,12,15,18,20,21,33)
• > stem(a)

• The decimal point is 1 digit(s) to the right of the |

• 1 | 02
• 1 | 58
• 2 | 01
• 2|
• 3|3
Cont..
• Income<-rlnorm(4000,meanlog=4,sdlog = 0.7)
• plot(density(log10(income),adjust=0.5),main=
"distribution")
• rug(log10(income))- creates one dimentional
density plot on the bottom of the graph to
emphasize the distribution of observation.
After applied rug funtion
Multiple variables-Examining 2
variables with regression
Cont..
• Regrssion line does not fit the data well
• Linear regression model – not suitable for the
relationship between 2 var. In the above
graph.
• Loess() curve can be used to fit a non linear
line to the data.
• It fits the data better than linear regression.
Dot chart – Multiple variables
Dot chart with 3 groups
Bar plot- multiple variables
Bar plot
Box Whisker plot
• Distribution of a continuous variable for each
value of a discrete variable
• > install.packages("ggplot2")
• > library(ggplot2)
• >p <- ggplot(mtcars, aes(factor(cyl), mpg))
• >p + geom_boxplot() + geom_jitter()
Box Whisker –cars data set
Box whisker plot
Code..
Cont..
• Box-> first quartile, third quartile
• Whisker-length: upper winch->1.5*First
quadrant, lower winch->1.5*third quadrant
• Median
• Based on the above graph , zip 0 and 9 is
having more house hold income based on
their median value.
Hex bin plot
• Scatter plot not suitable for large data sets
• Structure of the data become difficult to see
in scatterplot.
• Combines the feature of scatter plot and
histogram.
• Shading- to represent Concentration of data in
each hex bin.
Code..
• > install.packages("hexbin")
• >library(hexbin)
• >x <- rnorm(2000)
• y <- rnorm(2000)
• hbin <- hexbin(x,y, xbins = 40)
• > plot(hbin)
Hexbin plot
Hex bin plot
Scatter plot matrix
• Scatter plot matrix shows many scatter plots
in a compact, side by side fashion.
• Visually represent multiple attributes of data
set to explore relationships, magnifies
differences
• > colors <- c("orange", "black", "yellow")
• >pairs(iris[1:4], main = "Fisher’s Iris Dataset",pch
= 21, bg = colors[unclass(iris$Species)] )
• >legend(0.2, 0.02, horiz = TRUE,
as.vector(unique(iris$Species)),fill = colors, bty =
"n")
Scatterplot matrix
Data Exploration Vs presentation
Density plots- Data scientists
Histograms- stakeholders

You might also like