Professional Documents
Culture Documents
Ramon Saccilotto Universittsspital Basel
Hebelstrasse 10 T 061 265 34 07 F 061 265 31 09 saccilottor@uhbs.ch www.ceb-institute.org
Basel Institute for Clinical Epidemiology and Biostatistics
Introduction
"ggplot2 is an R package for producing statistical, or data, graphics, but it is unlike most other graphics packages because it
has a deep underlying grammar. This grammar, based on the Grammar of Graphics (Wilkinson, 2005), is composed of a set
of independent components that can be composed in many different ways. [..] Plots can be built up iteratively and edited
later. A carefuly chosen set of defaults means that most of the time you can produce a publication-quality graphic in
seconds, but if you do have speical formatting requirements, a comprehensive theming system makes it easy to do what
you want. [..]
ggplot2 is designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotation
and statistical summaries. [..]"
H.Wickham, ggplot2, Use R, DOI 10.1007/978-0-387-98141_1, Springer Science+Business Media, LLC 2009
"ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice
graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing
legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics."
Author
ggplot2 was developed by Hadley Wickham, assistant
professor of statistics at Rice University, Houston. In July
2010 the latest stable release (Version 0.8.8) was published.
2008
Ph.D. (Statistics), Iowa State University, Ames, IA. Practical tools for exploring data and
models.
2004 M.Sc. (Statistics), First Class Honours, The University of Auckland, Auckland, New Zealand.
2002 B.Sc. (Statistics, Computer Science), First Class Honours, The University of Auckland, Auckland, New Zealand.
1999 Bachelor of Human Biology, First Class Honours, The University of Auckland, Auckland, New Zealand.
Tutorial
#### Sample code for the illustration of ggplot2
10000
### comparison qplot vs ggplot cut
8000 Fair
6000
Very Good
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar() I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF
clarity
mpg
mpg
20 6 6
20
7 8
qplot(wt, mpg, data=mtcars, colour=factor(cyl)) 8
15 15
30 factor(cyl)
4
mpg
qsec
20
16
18
qplot(wt, mpg, data=mtcars, size=qsec, color=factor(carb), shape=I(1)) 15 20
22
# bar-plot
qplot(factor(cyl), data=mtcars, geom="bar") flips the plot after calculation of
any summary statistics
# flip plot by 90
qplot(factor(cyl), data=mtcars, geom="bar") + coord_flip()
14 14
10 10
count
6 6 6 6
8 8
2 2
0 0
4 6 8 4 6 8
factor(cyl) factor(cyl)
# fill by variable
qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(gear))
head(diamonds)
0.8
0.0
cut
3000 Fair
Good
count
Very Good
2000
Premium
Ideal
1000
10000
carat
qplot(cut, nrow, data=t.table, geom="bar", stat="identity") 6000
4000
count
4000 1000
2000 500
# change binwidth
0 0
20
15
2 3 4 5
wt
# tweeking the smooth plot ("loess"-method: polynomial surface using local fitting)
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"))
30
25
25
20
mpg
mpg
# making line more or less wiggly (span: 0-1) 20
15
10
2 3 4 5 2 3 4 5
wt wt
30
25
# save plot in variable (hint: data is saved in plot, changes in data do not change plot-data)
4
20
15
25
p.tmp mpg
6
20
15
5 30
25
4
8
4
20
3
15
4
6
wt
30
3
# change mtcars
25
4
20
15
4 30
8
25
3
mpg
20
2 15
30
p.tmp 25
8
20
15
2 3 4 5
wt
save(p.tmp, file="temp.rData")
wt
6
p.tmp 3
8
p.tmp + layer(geom="point")
p.tmp + layer(geom="point") + layer(geom="line") 15 20
mpg
25 30
300
2 2
p.tmp + geom_point(aes(color="darkblue"))
1 1
0 0
y
-3 -3
p.norm <- ggplot(t.df, aes(x,y))
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x x
p.norm + geom_point()
p.norm + geom_point(shape=1) 3 3
p.norm + geom_point(shape=".") 2 2
1 1
p.norm + geom_point(colour=alpha("black", 1/2))
0 0
y
-2 -2
-3 -3
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
x x
3
3
wt
4
3
5
4, 3 4, 4 4, 5
#facet_wrap / facet_grid
4
5
5
3
4
15 20 25 30 15 20 25 30 15 20 25 30
2 mpg
p.tmp + facet_wrap(~cyl)
4
wt
3
8, 3 8, 5
p.tmp + facet_grid(gear~cyl)
5
p.tmp + facet_wrap(~cyl+gear)
3
15 20 25 30 15 20 25 30 15 20 25 30
mpg
3.4
3.0 5.0
2.5
4.0
3
p.tmp + facet_wrap(~cyl, scales="free_x") 2.0
3.0
3.5
3.0
2.8
wt
3.0
8
2.5
4
5.0 2.0
3.0
5
3.5
2.0
YlOrRd
# using scales (color palettes, manual colors, matching of colors to values) YlOrBr
YlGnBu
YlGn
Reds
p.tmp <- qplot(cut, data=diamonds, geom="bar", fill=cut) RdPu
Purples
PuRd
PuBuGn
p.tmp PuBu
OrRd
Oranges
Greys
p.tmp + scale_fill_brewer() Greens
GnBu
BuPu
BuGn
p.tmp + scale_fill_brewer(palette="Paired") Blues
Set3
Set2
RColorBrewer::display.brewer.all() Set1
Pastel2
Pastel1
Paired
p.tmp + scale_fill_manual(values=c("#7fc6bc","#083642","#b1df01","#cdef9c","#466b5d")) Dark2
Accent
# changing name of legend (bug: in labs you must use "colour", "color" doesn't work)
qplot(mpg, wt, data=mtcars, colour=factor(cyl), geom="point") + labs(colour="Legend-Name")
# removing legend
qplot(mpg, wt, data=mtcars, colour=factor(cyl), geom="point") + scale_color_discrete(legend=FALSE)
qplot(mpg, wt, data=mtcars, colour=factor(cyl), geom="point") + opts(legend.position="none")
# dropping factors
mtcars2 <- transform(mtcars, cyl=factor(cyl))
levels(mtcars2$cyl)
qplot(mpg, wt, data=mtcars2, colour=cyl, geom="point") + scale_colour_discrete(limits=c("4", "8"))
"smooth"), method="lm") 3
wt
p.tmp 2
p.tmp + scale_x_continuous(limits=c(15,30)) 1
15 20 25 30
p.tmp + coord_cartesian(xlim=c(15,30)) mpg
p.tmp
5 5
wt
2
1
# using transformation 16 18 20 22
mpg
24 26 28 30 15 20
mpg
25 30
wt
# change specific options (hint: "color" does not work in theme_text() -> use colour)
qplot(mpg, wt, data=mtcars, geom="point", main="THIS IS A TEST-PLOT")
qplot(mpg, wt, data=mtcars, geom="point", main="THIS IS A TEST-PLOT") + opts(axis.line=theme_segment(),
plot.title=theme_text(size=20, face="bold", colour="steelblue"), panel.grid.minor=theme_blank(),
panel.background=theme_blank(), panel.grid.major=theme_line(linetype="dotted", colour="lightgrey", size=0.5),
panel.grid.major=theme_blank())
factor(gear)
12
color=I("blue")) + coord_flip() + theme_bw() 4
15
3
0 2 4 6 8 10 12 14
count
0
30
14
30
12
25
10
5 20
8 8 4
1 15
6
10
4
25
factor(cyl) 5 factor(cyl) factor(cyl)
2
4 4
count
count
4
count
0 0
6 6 6
8 8 8
10
20
15
6
0.8
surv
t.survframe <- createSurvivalFrame(t.survfit) 0.4
0.2
plot(t.survfit) 0.8
0.6
strata
surv
sex=2
sex=1
# two strata 0.4
0.6
surv
t.survframe <- createSurvivalFrame(t.survfit) 0.2 0.4
qplot_survival(t.survframe)
0.2
# with CI 0 200 400 600 800 1000
time
qplot_survival(t.survframe, TRUE)
0 200 400 600 800 1000
qplot_survival(t.survframe) strata
0.6 ph.karno=100
#plot without confidence intervals and with different shape
ph.karno=90
ph.karno=80
0.4 ph.karno=70
ph.karno=60
ph.karno=50
0.2
0.0
grid.newpage()
wt
3
# define viewports and assign it to grid layout
pushViewport(viewport(layout = grid.layout(x,y))) 2
}
15 20 25 30
mpg
wt
4 3
2
# define graphics 2
15 20 25 30 15 20 25 30
theme_bw() mpg mpg