You are on page 1of 4

EFFICIENT AND BEAUTIFUL DATA VISUALISATION

Understanding ggplot2’s jargon


Perhaps the trickiest bit when starting out with ggplot2 is understanding what type of
elements are responsible for the contents (data) versus the container (general look) of your
plot. Let’s de-mystify some of the common words you will encounter.

geom: a geometric object which defines the type of graph you are making. It reads your
data in the aesthetics mapping to know which variables to use, and creates the graph
accordingly. Some common types
are geom_point(), geom_boxplot(), geom_histogram(), geom_col(), etc.

aes: short for aesthetics. Usually placed within a geom_, this is where you specify your
data source and variables, AND the properties of the graph which depend on those
variables. For instance, if you want all data points to be the same colour, you would define
the colour = argument outside the aes() function; if you want the data points to be
coloured by a factor’s levels (e.g. by site or species), you specify the colour
= argument inside the aes().

stat: a stat layer applies some statistical transformation to the underlying data: for
instance, stat_smooth(method = "lm") displays a linear regression line and
confidence interval ribbon on top of a scatter plot (defined with geom_point()).

theme: a theme is made of a set of visual parameters that control the background, borders,
grid lines, axes, text size, legend position, etc. You can use pre-defined themes, create your
own, or use a theme and overwrite only the elements you don’t like. Examples of elements
within themes are axis.text, panel.grid, legend.title, and so on. You define their
properties with elements_...() functions: element_blank() would return something
empty (ideal for removing background colour), while element_text(size = ..., face
= ..., angle = ...) lets you control all kinds of text properties.
Also useful to remember is that layers are added on top of each other as you progress into
the code, which means that elements written later may hide or overwrite previous elements.

Open RStudio, select File/New File/R script and start writing your script with
the help of this tutorial. You might find it easier to have the tutorial open on half of
your screen and RStudio on the other half, so that you can go between the two
quickly.
# Purpose of the script
# Your name, date and email

# Your working directory, set to the folder you just downloaded


from Github, e.g.:
setwd("~/Downloads/CC-dataviz-beautification")

# Libraries ----
# if you haven't installed them before, run the code
install.packages("package_name")
library(tidyverse)
library(ggthemes) # for a mapping theme
library(ggalt) # for custom map projections
library(ggrepel) # for annotations
library(viridis) # for nice colours

# Data ----
# Load data - site coordinates and plant records from
# the Long Term Ecological Research Network
# https://lternet.edu and the Niwot Ridge site more specifically
lter <- read.csv("lter.csv")
niwot_plant_exp <- read.csv("niwot_plant_exp.csv")

# DISTRIBUTIONS ----
# Setting a custom ggplot2 function
# This function makes a pretty ggplot theme
# This function takes no arguments
# meaning that you always have just niwot_theme() and not
niwot_theme(something else here)

theme_niwot <- function(){


theme_bw() +
theme(text = element_text(family = "Helvetica Light"),
axis.text = element_text(size = 16),
axis.title = element_text(size = 18),
axis.line.x = element_line(color="black"),
axis.line.y = element_line(color="black"),
panel.border = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
plot.margin = unit(c(1, 1, 1, 1), units = , "cm"),
plot.title = element_text(size = 18, vjust = 1, hjust =
0),
legend.text = element_text(size = 12),
legend.title = element_blank(),
legend.position = c(0.95, 0.15),
legend.key = element_blank(),
legend.background = element_rect(color = "black",
fill = "transparent",
size = 2, linetype =
"blank"))
}

A data manipulation tip: Pipes (%>%) are great for streamlining data analysis. If you
haven’t used them before, you can find an intro in our tutorial here. A useful way to
familiriase yourself with what the pipe does at each step is to “break” the pipe and check out
what the resulting object looks like if you’ve only ran the code up to a certain point. You can
do that by just select the relevant bit of code and running only that, but remember you have
to exclude the piping operator at the end of the line, so e.g. you select up
to niwot_richness <- niwot_plant_exp %>% group_by(plot_num,
year) and not the whole niwot_richness <- niwot_plant_exp %>%
group_by(plot_num, year) %>%.

Running pipes sequentially line by line also comes in handy when there is an error in
your pipe and you don’t know which part exactly introduces the error.

Grouping by a certain variable is probably one of the most commonly used functions
from the tidyverse (e.g., in our case we group by year and plot to calculate species
richness for every combo of those two grouping variables), but remember to ungroup
afterwards as if you forget, the grouping remains even if you don’t “see” it and that
might later on lead to some unintended consequences.

# Calculate species richness per plot per year


niwot_richness <- niwot_plant_exp %>% group_by(plot_num, year) %>%
mutate(richness = length(unique(USDA_Scientific_Name))) %>%
ungroup()
(distributions1 <- ggplot(niwot_richness, aes(x = fert, y =
richness)) +
geom_violin())

ggsave(distributions1, filename = "distributions1.png",


height = 5, width = 5)

(distributions2 <- ggplot(niwot_richness, aes(x = fert, y =


richness)) +
geom_violin(aes(fill = fert, colour = fert), alpha = 0.5) +
# alpha controls the opacity
theme_niwot())

ggsave(distributions2, filename = "distributions2.png",


height = 5, width = 5)

(distributions3 <- ggplot(niwot_richness, aes(x = fert, y =


richness)) +
geom_violin(aes(fill = fert, colour = fert), alpha = 0.5) +
geom_boxplot(aes(colour = fert), width = 0.2) +
theme_niwot())

ggsave(distributions3, filename = "distributions3.png",


height = 5, width = 5)

You might also like