You are on page 1of 56

Lecture 8: graphing in R

and intro to ggplot
Ben Fanson
Simeon Lisovski
Lecture Outline
1) introduction to R graphics
2) introduction to ggplot



Helpful references
- http://www.cookbook-r.com/Graphs/
- ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham
R graphics
Pros
1) You can make almost any graph that you can think of
2) Graphics are publishable quality
3) Combined with the previous programming learned, you can 'easily' make
very complex graphs to visualize your data and statistical models
4) You can make lots of graphs easily [e.g. plot for each individual]

Cons
1) it takes some effort to learn the language and quirks of the graphing
approach

Overview of R main
graphics
R graphics

base plot
[original R graphics]

- plot()
- hist()
- barplot()
- pairs()
plot(...) image(...)
barplot (...)
persp(...)
pairs(...)
and lots more....
Some advantages of base plot
1) I find it the easiest to build very customized plot since you build the
plots one element at a time


#--- example code to build a plot by each element ---#
plot.new()
points(seq(0,1,0.1),seq(0,1,0.1), pch=1:10)
axis(1,at=c(0.2,0.7))
axis(2,at=c(0.1,0.8))
mtext('xlab',1,line=2)
mtext('ylab',2,line=2)
box()
abline(0,1, col='red')
mtitle('Title',lr='')
Some advantages of base plot
1) I find it the easiest way to build very customized plot, since you
build the plots one element at a time

2) being the original, it is the most integrated with packages

base plot and methods
ds <- data.frame(x=1:10,y=rnorm(10,1:10,3))
plot(ds$y ~ ds$x)
base plot and methods
ds <- data.frame(x=1:10,y=rnorm(10,1:10,3))
plot(ds$y ~ ds$x)
lm_model <- lm( y ~ x, data=ds)
par(mfrow=c(2,2))
plot(lm_model)
base plot and methods
how can plot() give you very different results?????????????
ds <- data.frame(x=1:10,y=rnorm(10,1:10,3))
plot(ds$y ~ ds$x)
lm_model <- lm( y ~ x, data=ds)
par(mfrow=c(2,2))
plot(lm_model)
plot() is not a single function
How does plot() work?
1) plot() looks at the class of the object(s) and then choose another
function
e.g. plot( y ~ x )
plot asks what is class(y) and class(x) and since both are numeric vector, it
makes a scatterplot


plot() is not a single function
How does plot() work?
2) plot() looks at the class of the object(s) and then choose another
function
e.g. plot( lm_mod )


plot asks what is class(lm_mod), and since it is a
'lm' class, it runs function plot.lm() which makes
four graphs by default
methods(plot)
base plot and methods
Overview of R main
graphics
R graphics

base plot
[original R graphics]

- plot()
- hist()
- barplot()
- pairs()
Overview of R main
graphics
R graphics
grid graphics
[ alternative framework]

base plot
[original R graphics]

- plot()
- hist()
- barplot()
- pairs()
Overview of R main
graphics
R graphics
grid graphics
[ alternative framework]

base plot
[original R graphics]

lattice
- plot()
- hist()
- barplot()
- pairs()
- xyplot()
- barchart()
- wireframe()
xyplot(...)
Faceting (aka Trellising)
barchart(...)
Lattice can also do most things are base plot
wireframe(...)
Overview of R main
graphics
R graphics
grid graphics
[ alternative framework]

base plot
[original R graphics]

lattice
- plot()
- hist()
- barplot()
- pairs()
- xyplot()
- barchart()
- wireframe()
Overview of R main
graphics
R graphics
grid graphics
[ alternative framework]

base plot
[original R graphics]

lattice ggplot2
- plot()
- hist()
- barplot()
- pairs()
- ggplot() + geom_line()
- ggplot() + geom_point()

- xyplot()
- barchart()
- wireframe()


http://mandymejia.wordpress.com/2013/11/13/10-reasons-to-switch-to-ggplot-7/
ggplot(...) + geom_point(...) + facet_wrap(...)
ggmap(...) + geom_tiles()
why I use ggplot?
1) I like the faceting and grouping...makes it easy to make quick, yet
complex graphs for data exploration

2) I found it easier to add a new layer

3) I liked the grouping options and colour schemes in ggplot

4) You can make up your own 'theme' that you can use over and over
again
5) Lots of active development in the area
cons of ggplot...
1) I find working with grid graphics more difficult than base plot. This
makes it harder to do some of those final touches on the graph.
[Note- ggplot2 community is active, so can often find the answer or
get help easy enough]

2) no 3d plotting

3) Customising axis labels for facetted graphs can be annoying

4) cannot do double axes
a) Hadley Wickham refuses to add this feature due to philosophical objections
b) though I have heard of a workaround for it


Saving a graph
jpeg(filename, height=, width=, units=,res= )
jpeg('figures/test1.jpg', height=6, width=6, units='cm', res=1000)
plot(....)
dev.off()

pdf(filename, height=, width=)
pdf('figures/test1.pdf', height=6, width=6)
plot(....)
dev.off()

Raster vs. vector graphics
Raster images
- method: based on a grid of dots (pixels). Each pixel is assigned a
colour.

- file formats: jpg, tiff, bitmap, psd

- use: best for photographs
Raster vs. vector graphics
Vector images
- method: based on mathematical equations to redraw the image

- file formats: eps, ps, pdf, ai

- use: best for drawings, logos, graphics. Much easier to do post-
processing revisions

Raster vs. vector graphics
Adobe illustrator for post-processing
Illustrator is great for minor little touches to the graphs or collating
multiple graphs into a single page.

<< illustrator quick demo >>
Short introduction to ggplot


geoms – geometric objects [think of as plot type]
e.g. scatterplot, line graph, histogram




ggplot jargon
geom_point() geom_line() geom_bar()
geoms – geometric objects [think of as plot type]
e.g. scatterplot, line graph, histogram

aes – aesthetics are the attributes associated with each geometric
object



ggplot jargon
aesthetics
x-value = 2.4
y-value = 0.4
shape = dot
colour = black
transparency = opaque
aesthetics
x-value = c(1.7,2.4,2.7...)
y-value = c(-0.5, 0.4,0.6...)
line type = solid
colour = black
transparency = opaque
geoms – geometric objects [think of as plot type]
e.g. scatterplot, line graph, histogram

aes – aesthetics are the attributes associated with each geometric
object

scales – attributes of the x-axis and y-axis [and any z-axis]

ggplot jargon
scales
continuous
ranges from -1.5 to 2.1
ticks marks at every 0.5

scales
continuous
ranges from -1.0 to 1.0
ticks marks at every 0.5

set.seed=100
ds <- data.frame(x=1:10,y=rnorm(10))
ggplot(ds, aes(x=x, y=y)) + geom_point(aes(size=y))
geoms – geometric objects [think of as plot type]
e.g. scatterplot, line graph, histogram

aes – aesthetics are the attributes associated with each geometric
object

scales – attributes of the x-axis and y-axis [and any z-axis]

facets – making separate plots broken up by one or two variables
ggplot jargon
facets
set.seed=100
ds <- data.frame(x=1:30,y=rnorm(30), sex=rep(c('m','f'),each=15))
ggplot(ds, aes(x=x, y=y)) + geom_point(aes(size=y)) + facet_grid(.~sex)
similar to dplyr grammar, think of it as a sentence that you are building
'specify dataset' + # ggplot(ds,...)
ggplot grammar
similar to dplyr grammar, think of it as a sentence that you are building
'specify dataset' +
'specify x, y, grouping variables' +

ggplot grammar
# aes(x=,y=,col=, shape=)
similar to dplyr grammar, think of it as a sentence that you are building
'specify dataset' +
'specify x, y, grouping variables' +
'specify plot layers (e.g. point, line, stat function)' +

ggplot grammar
# geom_name()
similar to dplyr grammar, think of it as a sentence that you are building
'specify dataset' +
'specify x, y, grouping variables' +
'specify plot layers (e.g. point, line, stat function)' +
'specify if you want faceting' +

ggplot grammar
# facet_grid()
similar to dplyr grammar, think of it as a sentence that you are building
'specify dataset' +
'specify x, y, grouping variables' +
'specify plot layers (e.g. point, line, stat function)' +
'specify if you want faceting' +
'specify minor details/options [labels, position of legend..]'

ggplot grammar
# scale_name(), theme(), labs()
example dataset
Bird_id Sex Treatment Growth_rate
1 male t1 12.3
2 male t2 10.3
3 male t3 14.5
4 female t1 14.3
5 female t2 9.3
6 female t3 15.6
= ds
ggplot( ds ) + geom_point( aes(x=sex, y= growth_rate ) )


scatterplot of data
ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate ) )


scatterplot of data
ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate, col=sex ) )


scatterplot of data – colour
by sex
ggplot( ds ) + geom_point( aes(x=trt, y= growth_rate, col=sex ) ) +
geom_line( aes(x=trt, y= growth_rate, col=sex, group=sex ) )


what if we want to add a line
group = sex is needed only
because trt is categorical.
If trt was numeric, then it
would not be needed
ggplot( ds, aes(x=trt, y= growth_rate, col=sex, group=sex ) ) +
geom_point( ) +
geom_line( )


you can move aes() to
ggplot()
ggplot( ds, aes(x=trt, y= growth_rate, col=sex, group=sex ) ) +
geom_point( ) +
geom_line( ) +
facet_grid(.~sex)


adding a facet
row ~ column
['.' just means no grouping variable]
1) I have not had to specify the dataset anymore

2) all the geom adopt the same scales (no specifying x-range or y-range)

3) grouping by colour, shape, fill, etc. is easy

4) faceting is quick

5) a common language to everything (i.e. not a bunch of separate
packages for different plot types)
key points so far
Learning about base plot
- introducing basics of plot()
- overlaying plots and customizing your plots
- discuss some more advanced plotting functions



What's next
Lecture 8: Hands on Section
1) get Lecture8.R from github

2) make sure that you have data/lecture7/ [same files as last week]

3) open up Lecture8.R in Rcourse_proj.Rpoj

4) start working through the example and then try the exercise
Lecture 8 files