You are on page 1of 61

DATA VISUALIZATION WITH GGPLOT2

Introduction
Data Visualization with ggplot2

Your Instructor - Rick Scave!a


● rick.scave!a@science-cra".com
● @Rick_Scave!a
Data Visualization with ggplot2

Data Visualization
Data Visualization with ggplot2

Data Visualization & Data Science


Data Visualization with ggplot2

Data Visualization & Data Science


Statistics Design

Graphical Communication
Data Analysis & Perception
Data Visualization with ggplot2

Exploratory versus Explanatory

Explore Explain

Confirm Inform
and and
Analyse Persuade
Data Visualization with ggplot2

MASS::mammals
> library(MASS)
> mammals
body brain
Arctic fox 3.385 44.50
Owl monkey 0.480 15.50
Mountain beaver 1.350 8.10
Cow 465.000 423.00
Grey wolf 36.330 119.50
Goat 27.660 115.00
Roe deer 14.830 98.20
...
Pig 192.000 180.00
Echidna 3.000 25.00
Brazilian tapir 160.000 169.00
Tenrec 0.900 2.60
Phalanger 1.620 11.40
Tree shrew 0.104 2.50
Red fox 4.235 50.40
Data Visualization with ggplot2

Sca!er Plot ●

> library(ggplot2) 4000


> ggplot(mammals, aes(x = body, y = brain)) +
geom_point()

brain
2000


●● ●
● ●




● ●
●●



0 ●

0 2000 4000 6000


body
Data Visualization with ggplot2

Explore - Statistics 6000

> ggplot(mammals, aes(x = body, y = brain)) +


geom_point(alpha = 0.6) + 4000
stat_smooth(method = "lm", col = "red", se = FALSE)

brain
2000

0 2000 4000 6000


body
Data Visualization with ggplot2

Explore - Fine-tuning 10000

> ggplot(mammals, aes(x = body, y = brain)) +


geom_point(alpha = 0.6) +
100
coord_fixed() +

brain
scale_x_log10() +
scale_y_log10() +
stat_smooth(method = "lm", 1

col = "#C42126", 

se = FALSE, size = 1) 1e−01 1e+01 1e+03
body
Data Visualization with ggplot2

Publication-ready plot
> library(scales) # functions trans_breaks and trans_format
> ggplot(mammals, aes(x = body, y = brain)) +
annotation_logticks() +
geom_point(alpha = 0.6) +
coord_fixed(xlim = c(10^-3, 10^4), ylim = c(10^-1, 10^4)) +
scale_x_log10(expression("Body weight (log"["10"]*"(Kg))"),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
scale_y_log10(expression("Brain weight (log"["10"]*"(g))"),
breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x))) +
stat_smooth(method = "lm", col = "#C42126", se = FALSE, size = 1) +
theme_classic()
Data Visualization with ggplot2

4
10
Brain weight (log10(g))

3
10

2
10

1
10

0
10

−1
10
−3 −2 −1 0 1 2 3 4
10 10 10 10 10 10 10 10
Body weight (log10(Kg))
Data Visualization with ggplot2

Anscombe Plots

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15

10

0
0 5 10 15 20
Data Visualization with ggplot2

15 15

10 10

5 5

0 0
0 5 10 15 20 0 5 10 15 20

15 15

10 10

5 5

0 0
0 5 10 15 20 0 5 10 15 20
Data Visualization with ggplot2

Sepals

Species
Width (cm)

Setosa
Versicolor
3
Virginica

2
4 5 6 7 8
Length (cm)
Data Visualization with ggplot2

Sepals

Species
Width (cm)

Setosa
Versicolor
3
Virginica

2
4 5 6 7 8
Length (cm)

4
Species, Flower Part
Virginica, Sepals
Width (cm)

3 Virginica, Petals
Versicolor, Sepals
Versicolor, Petals
2
Setosa, Sepals
Setosa, Petals

0
0 1 2 3 4 5 6 7 8
Length (cm)
Data Visualization with ggplot2

Setosa Versicolor Virginica


5
4
Width (cm)

Flower Part
3
Sepal
2 Petal
1
0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Length (cm)

4
Species, Flower Part
Virginica, Sepals
Width (cm)

3 Virginica, Petals
Versicolor, Sepals
Versicolor, Petals
2
Setosa, Sepals
Setosa, Petals

0
0 1 2 3 4 5 6 7 8
Length (cm)
Data Visualization with ggplot2

Setosa Versicolor Virginica


5
4
Width (cm)

Flower Part
3
Sepal
2 Petal
1
0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Length (cm)

Setosa Versicolor Virginica


8

5
Centimeters

Flower Part
4 Sepal
Petal

0
Length Width Length Width Length Width
Measure
Data Visualization with ggplot2

10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Vocabulary test score

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 5 10 15 20
Education (years)
Data Visualization with ggplot2

10
Vocabulary test score

0
0 5 10 15 20
Education (years)
Data Visualization with ggplot2

10
Vocabulary test score

0
0 5 10 15 20
Education (years)

10 ● ● ● ● ● ● ● ● ● ● ●● ● ●●
● ●
● ● ● ● ● ●
●●
Vocabulary test score

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● Proportion
● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 0.1
5 ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●

● 0.2

●● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.3
● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0 5 10 15 20
Education (years)
Data Visualization with ggplot2

Case Study: CHIS

500

250
Under−weight
0
500
Number of Individuals

250
Healthy−weight
0
500

250
Over−weight
0
500

250
Obese
0
20 40 60 80
Age
Data Visualization with ggplot2

Case Study: CHIS

1.00

0.75
BMI Category
Proportion

Obese

0.50 Over−weight
Healthy−weight
Under−weight

0.25

0.00
20 40 60 80
Age
Data Visualization with ggplot2

Case Study: CHIS

1.00

0.75

BMI Category
Obese
Proportion

0.50 Over−weight
Healthy−weight
Under−weight

0.25

0.00
0 10000 20000 30000 40000
Individuals
Data Visualization with ggplot2

Case Study: CHIS

1.00

0.75
residual
8
Proportion

4
0.50
0

−4

−8
0.25

0.00
0 10000 20000 30000 40000
Individuals
Data Visualization with ggplot2

Outline
● Course 1
● Concepts, data, aesthetics, geometries
● Course 2
● Statistics, coordinates, facets, themes
● Best practices
● Case study
● Course 3
● Advanced plots and ggplot2 internals
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!
DATA VISUALIZATION WITH GGPLOT2

Grammar of Graphics
Data Visualization with ggplot2

The quick brown fox jumps over the lazy dog

Article The A The

Adjective quick brown rabid red

Noun fox fox Hunter

Verb jumps bit shot

Preposition over

Article the the the

Adjective lazy friendly rabid red

Noun dog. dog. fox.


Data Visualization with ggplot2

Grammar of Graphics
● Plo!ing Framework
● Leland Wilkinson, Grammar of Graphics, 1999
● 2 principles
● Graphics = distinct layers of grammatical elements
● Meaningful plots through aesthetic mapping
Data Visualization with ggplot2

Essential Grammatical Elements


Element Description

Data The dataset being plo!ed.

Aesthetics The scales onto which we map our data.

Geometries The visual elements used for our data.


Data Visualization with ggplot2

All Grammatical Elements


Element Description

Data The dataset being plo!ed.

Aesthetics The scales onto which we map our data.

Geometries The visual elements used for our data.

Facets Plo!ing small multiples.

Statistics Representations of our data to aid understanding.

Coordinates The space on which the data will be plo!ed.

Themes All non-data ink.


Data Visualization with ggplot2

Diagram
Data {variables of interest}
x-axis colour size alpha line width
Aesthetics y-axis fill labels shape line type
Geometries point line histogram bar boxplot

Facets columns rows

Statistics binning smoothing descriptive inferential

Coordinates cartesian fixed polar limits

Themes non-data ink


Data Visualization with ggplot2

Grammar of Graphics
● Building blocks
● Solid, creative, meaningful visualizations
● Course 1: First 3 layers
● Course 2: Remaining 4 layers
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!
DATA VISUALIZATION WITH GGPLOT2

ggplot2
Data Visualization with ggplot2

ggplot2
● Hadley Wickham
● Layer grammatical elements
● Aesthetic Mappings
Data Visualization with ggplot2

ggplot2 Layers - Data

Data
Data Visualization with ggplot2

Iris dataset
● Edgar Anderson
● R.A. Fischer

setosa versicolor virginica

Source: Wikimedia
Data Visualization with ggplot2

Iris dataset
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
...

50 5.0 3.3 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
53 6.9 3.1 4.9 1.5 versicolor
...

100 5.7 2.8 4.1 1.3 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
103 7.1 3.0 5.9 2.1 virginica
...
150 5.9 3.0 5.1 1.8 virginica
Data Visualization with ggplot2

ggplot2 Layers - Aesthetics

Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Aesthetics

Species Sepal.Length Sepal.Width Petal.Length Petal.Width


X Y
Data Visualization with ggplot2

ggplot2 Layers - Geometries

Geometries
Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Geometries


> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_jitter(alpha = 0.6)

4.5

4.0
Sepal.Width

3.5

3.0

2.5

2.0

5 6 7 8
Sepal.Length
Data Visualization with ggplot2

ggplot2 Layers - Facets

Facets
Geometries
Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Facets


> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_jitter(alpha = 0.6) +
facet_grid(. ~ Species)

setosa versicolor virginica


4.5

4.0
Sepal.Width

3.5

3.0

2.5

2.0
5 6 7 8 5 6 7 8 5 6 7 8
Sepal.Length
Data Visualization with ggplot2

ggplot2 Layers - Statistics

Statistics
Facets
Geometries
Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Statistics


> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_jitter(alpha = 0.6) +
facet_grid(. ~ Species) +
stat_smooth(method = "lm", se = F, col = "red")
Setosa Versicolor Virginica
4.5

4.0

Sepal.Width
3.5

3.0

2.5

2.0
5 6 7 8 5 6 7 8 5 6 7 8
Sepal.Length
Data Visualization with ggplot2

ggplot2 Layers - Coordinates

Coordinates
Statistics
Facets
Geometries
Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Coordinates


> levels(iris$Species) <- c("Setosa", "Versicolor", "Virginica")
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_jitter(alpha = 0.6) +
facet_grid(. ~ Species) +
stat_smooth(method = "lm", se = F, col = "red") +
scale_y_continuous("Sepal Width (cm)", 

limits = c(2,5), 

expand = c(0,0)) +
scale_x_continuous("Sepal Length (cm)",
limits = c(4,8),
expand = c(0,0)) +
coord_equal()
Sepal Width (cm)

Setosa Versicolor Virginica


5
4
3
2
4 5 6 7 84 5 6 7 84 5 6 7 8
Sepal Length (cm)
Data Visualization with ggplot2

ggplot2 Layers - Themes


Theme
Coordinates
Statistics
Facets
Geometries
Aesthetics
Data
Data Visualization with ggplot2

ggplot2 Layers - Themes


> levels(iris$Species) <- c("Setosa", "Versicolor", "Virginica")

> library(grid)
> ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
... # previous code left out
coord_equal() +
theme(panel.background = element_blank(),
plot.background = element_blank(),
legend.background = element_blank(),
legend.key = element_blank(),
strip.background = element_blank(),
axis.text = element_text(colour = "black"),
axis.ticks = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"),
strip.text = element_blank(),
panel.margin = unit(1, "lines")
)
Data Visualization with ggplot2

ggplot2 Layers - Themes


Sepal Width (cm)

5
4
3
2
4 5 6 7 8 4 5 6 7 8 4 5 6 7 8
Sepal Length (cm)
DATA VISUALIZATION WITH GGPLOT2

Let’s practice!

You might also like