You are on page 1of 470


​Officially: Hello ! Formation R - https://thinkr.fr 3 / 470

A sentence per bullet-point:

​Getting to know each other Formation R - https://thinkr.fr 5 / 470



Name/pseudonym/...

​For the ones who cannot remember names Formation R - https://thinkr.fr 7 / 470

Oue Goal: Making You Independent

​It is gonna be dense... Formation R - https://thinkr.fr 9 / 470


Internal rules

​It is gonna be dense... Formation R - https://thinkr.fr 10 / 470


Internal rules - Essentials

​It is gonna be dense... Formation R - https://thinkr.fr 11 / 470


We learn by making mistakes

​It is gonna be dense... Formation R - https://thinkr.fr 12 / 470



Concentration & Breaks

​Timing Formation R - https://thinkr.fr 14 / 470



Using Zoom

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 16 / 470


Using Zoom

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 17 / 470


Using Zoom

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 18 / 470


Using Whereby

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 19 / 470


Using Whereby

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 20 / 470


How to interact?

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 21 / 470


SOS

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 22 / 470



Take a tour!

​What is Bakacode? Formation R - https://thinkr.fr 24 / 470


Connection

​What is Bakacode? Formation R - https://thinkr.fr 25 / 470


Home page

​What is Bakacode? Formation R - https://thinkr.fr 26 / 470


Home page

​What is Bakacode? Formation R - https://thinkr.fr 27 / 470


Home page - launch

​What is Bakacode? Formation R - https://thinkr.fr 28 / 470


Pratice - presentation

​What is Bakacode? Formation R - https://thinkr.fr 29 / 470


Pratice - presentation

​What is Bakacode? Formation R - https://thinkr.fr 30 / 470


Pratice - presentation

​What is Bakacode? Formation R - https://thinkr.fr 31 / 470


Pratice - search

​What is Bakacode? Formation R - https://thinkr.fr 32 / 470


Pratice - export

​What is Bakacode? Formation R - https://thinkr.fr 33 / 470


Top menu

​What is Bakacode? Formation R - https://thinkr.fr 34 / 470


Top menu

​What is Bakacode? Formation R - https://thinkr.fr 35 / 470


Top menu

​What is Bakacode? Formation R - https://thinkr.fr 36 / 470


Top menu

​What is Bakacode? Formation R - https://thinkr.fr 37 / 470


Top menu

​What is Bakacode? Formation R - https://thinkr.fr 38 / 470



Training Goal:

​Traning content Formation R - https://thinkr.fr 40 / 470



What is R ?

​Welcome to R Formation R - https://thinkr.fr 42 / 470


Main functionnalities

​Welcome to R Formation R - https://thinkr.fr 43 / 470


How R works?

​Welcome to R Formation R - https://thinkr.fr 44 / 470


Installing R

​Welcome to R Formation R - https://thinkr.fr 45 / 470


Installing Rstudio

​Welcome to R Formation R - https://thinkr.fr 46 / 470



Create a project

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 48 / 470


Create a project

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 49 / 470


Load a project

.Rproj

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 50 / 470



Getting started with RStudio

​Naviguate in Rstudio Formation R - https://thinkr.fr 52 / 470


Getting started with RStudio

​Naviguate in Rstudio Formation R - https://thinkr.fr 53 / 470


Console

​Naviguate in Rstudio Formation R - https://thinkr.fr 54 / 470


Source

​Naviguate in Rstudio Formation R - https://thinkr.fr 55 / 470


Environment

​Naviguate in Rstudio Formation R - https://thinkr.fr 56 / 470


Files and others

​Naviguate in Rstudio Formation R - https://thinkr.fr 57 / 470



Create objects in R

(10 + 2) * 5

<-

a <- 15
a

n <- 10 + 2
n

n <- 3 * 2
n

​The console Formation R - https://thinkr.fr 59 / 470


Create objects in R

a <- 3
a

#> [1] 3

A <- 9
A

#> [1] 9

#> [1] 3

​The console Formation R - https://thinkr.fr 60 / 470


Quiz

a <- 5
b <- a * 4
a <- B

a a

b B

​The console Formation R - https://thinkr.fr 61 / 470



Using an R base function

runif()

runif()

runif(n = 1)
runif(n = 1, min = -5)
runif(n = 3, max = 5)
runif(n = 1, min = -5, max = 5)
runif(n = 3, min = -5, max = 5)

​Functions Formation R - https://thinkr.fr 63 / 470


About parameters

​Functions Formation R - https://thinkr.fr 64 / 470



What is a package?

{proustr}

{proustr}

{proustr}

​Customize R with packages Formation R - https://thinkr.fr 66 / 470



Loading a package
library(packagename)

library(proustr)

data()

{proustr} albertinedisparue
alombredesjeunesfillesenfleurs ducotedechezswann laprisonniere
lecotedeguermantes letemprepreve proust_char sodomeetgomorrhe
stop_words

data(stop_words)

​Load packages Formation R - https://thinkr.fr 68 / 470


Good practice
library()
library()

​Load packages Formation R - https://thinkr.fr 69 / 470



The CRAN

​Install packages from CRAN Formation R - https://thinkr.fr 71 / 470


Installing a package from the CRAN
install.packages('packagename')

​Install packages from CRAN Formation R - https://thinkr.fr 72 / 470


Installing a package from the CRAN

​Install packages from CRAN Formation R - https://thinkr.fr 73 / 470


Exercise

draw()

​Install packages from CRAN Formation R - https://thinkr.fr 74 / 470



Shortcuts to remember:

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 76 / 470


Shortcuts to remember (windows)

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 77 / 470


Shortcuts to remember (mac)

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 78 / 470



Data, what does it look like?

​Data manipulation workflow Formation R - https://thinkr.fr 80 / 470


Data, what does it look like?

ibmi
consumed_quantity

age_class
food_type

​Data manipulation workflow Formation R - https://thinkr.fr 81 / 470


Data manipulation workflow

​Data manipulation workflow Formation R - https://thinkr.fr 82 / 470


First and foremost: graphs!

​Data manipulation workflow Formation R - https://thinkr.fr 83 / 470


First and foremost: graphs!

​Data manipulation workflow Formation R - https://thinkr.fr 84 / 470



Studying children's BMI

#> # A tibble: 6 × 4
#> bmi age_class food_type consumed_quanti…
#> <dbl> <fct> <chr> <dbl>
#> 1 13 7-10 years Sweets and chocolate 84.4
#> 2 13 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sav… 135.
#> 3 13 7-10 years Viennese pastries, cakes and sweet cookies 166.
#> 4 13 11-14 years Sweets and chocolate 23.0
#> 5 13 11-14 years Sandwiches, Pizzas, Pies, Pastries and Sav… 115.
#> 6 13 11-14 years Viennese pastries, cakes and sweet cookies 188.

​The life history of a plot Formation R - https://thinkr.fr 86 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 87 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 88 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 89 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 90 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 91 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 92 / 470


Graph Creation Process

​The life history of a plot Formation R - https://thinkr.fr 93 / 470


Tadaaaa ! Here are the plots you will
create

​The life history of a plot Formation R - https://thinkr.fr 94 / 470


Quiz

​The life history of a plot Formation R - https://thinkr.fr 95 / 470



Package {ggplot2}

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 97 / 470


Construction of a graph in the form of
successive and additive layers

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 98 / 470


Construction of a graph in the form of
successive and additive layers

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 99 / 470


Construction of a graph in the form of
successive and additive layers

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 100 / 470


Construction of a graph in the form of
successive and additive layers

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 101 / 470


Construction of a graph in the form of
successive and additive layers

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 102 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) # Data


aes(x = ..., y = ...) # Aesthetic mappings
geom_...() # Geometries
facet_...() # Facets
stat_...() # Statistical elements
coord_...() # Coordinates
theme_...() # Theme

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 103 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ..., y = ...) + # Aesthetic mappings
geom_...() + # Geometries
facet_...() + # Facets
stat_...() + # Statistical elements
coord_...() + # Coordinates
theme_...() # Theme

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 104 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) # Data


aes(x = ..., y = ...) # Aes. mappings
geom_...() # Geometries

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 105 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings
geom_...() # Geometries

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 106 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings
geom_...() + # Geometries
scale_...() # Points color

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 107 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings
geom_...() + # Geometries
scale_...() + # Points color
labs(title = ..., # Titles
...)

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 108 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings
geom_...() + # Geometries
scale_...() + # Points color
labs(title = ..., # Titles
...) +
facet_...() # Facets

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 109 / 470


Construction of a graph in the form of
successive and additive layers

ggplot(data = ...) + # Data


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings
geom_...() + # Geometries
scale_...() + # Points color
labs(title = ..., # Titles
...) +
facet_...() + # Facets
theme_...() # Theme

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 110 / 470


Quiz

ggplot(data = ..., ggplot(data = ...)


x = ..., aes(x = ..., y = ...)
y = ..., geom_...()
geom = ...)

ggplot(data = ...) +
aes(x = ..., y = ...) +
geom_...()

​{ggplot2} ? What is it ? Formation R - https://thinkr.fr 111 / 470



Data format required by {ggplot2}
data.frame

data.frame

​The first steps of building a graph Formation R - https://thinkr.fr 113 / 470


Data format required by {ggplot2}

bmi
age_class
food_type
consumed_quantity

#> # A tibble: 6 × 4
#> bmi age_class food_type consumed_quanti…
#> <dbl> <fct> <chr> <dbl>
#> 1 13 7-10 years Sweets and chocolate 84.4
#> 2 13 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sav… 135.
#> 3 13 7-10 years Viennese pastries, cakes and sweet cookies 166.
#> 4 13 11-14 years Sweets and chocolate 23.0
#> 5 13 11-14 years Sandwiches, Pizzas, Pies, Pastries and Sav… 115.
#> 6 13 11-14 years Viennese pastries, cakes and sweet cookies 188.

​The first steps of building a graph Formation R - https://thinkr.fr 114 / 470


The first steps of building a graph

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity) +
geom_point()

​The first steps of building a graph Formation R - https://thinkr.fr 115 / 470


The first steps of building a graph

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity) +
geom_point()

ggplot(data = ...)

aes(...)

geom_...()

​The first steps of building a graph Formation R - https://thinkr.fr 116 / 470


Define variables to display with aes()

aes()

​The first steps of building a graph Formation R - https://thinkr.fr 117 / 470


Define variables to display with aes()

​The first steps of building a graph Formation R - https://thinkr.fr 118 / 470


Define variables to display with aes()

color
x fill

y shape
size
alpha

​The first steps of building a graph Formation R - https://thinkr.fr 119 / 470


Pick geometric objects with geom_*()

geom_*()

​The first steps of building a graph Formation R - https://thinkr.fr 120 / 470


Pick geometric objects with geom_*()
geom_*()

geom_histogram()

geom_point()

geom_boxplot()

geom_density()

geom_violin()

geom_col()

geom_label()

geom_*()

geom_*()

​The first steps of building a graph Formation R - https://thinkr.fr 121 / 470


Building example plot

​The first steps of building a graph Formation R - https://thinkr.fr 122 / 470


Builiding example plot

#> # A tibble: 171 × 4


#> bmi age_class food_type consumed_quanti…
#> <dbl> <fct> <chr> <dbl>
#> 1 13 7-10 years Sweets and chocolate 84.4
#> 2 13 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sa… 135.
#> 3 13 7-10 years Viennese pastries, cakes and sweet cookies 166.
#> 4 13 11-14 years Sweets and chocolate 23.0
#> 5 13 11-14 years Sandwiches, Pizzas, Pies, Pastries and Sa… 115.
#> 6 13 11-14 years Viennese pastries, cakes and sweet cookies 188.
#> 7 14 7-10 years Sweets and chocolate 56.3
#> 8 14 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sa… 114.
#> 9 14 7-10 years Viennese pastries, cakes and sweet cookies 255.
#> 10 14 11-14 years Sweets and chocolate 73.5
#> # … with 161 more rows

bmi
age_class
food_type
consumed_quantity

​The first steps of building a graph Formation R - https://thinkr.fr 123 / 470


Builiding example plot

bmi

consumed_quantity

ggplot(data = data_plot_target) +
aes(
x = bmi,
y = consumed_quantity
) +
geom_point()

​The first steps of building a graph Formation R - https://thinkr.fr 124 / 470


Builiding example plot

bmi

consumed_quantity

ggplot(data = data_plot_target) +
aes(
x = bmi,
y = consumed_quantity
) +
geom_point()

​The first steps of building a graph Formation R - https://thinkr.fr 125 / 470


Builiding example plot

bmi

consumed_quantity

food_type

ggplot(data = data_plot_target) +
aes(
x = bmi,
y = consumed_quantity,
color = food_type
) +
geom_point()

​The first steps of building a graph Formation R - https://thinkr.fr 126 / 470


Builiding example plot

bmi

age_class

ggplot(data = data_plot_target) +
aes(
x = bmi,
fill = age_class
) +
geom_density()

​The first steps of building a graph Formation R - https://thinkr.fr 127 / 470


Builiding example plot

bmi

age_class

ggplot(data = data_plot_target) +
aes(
x = bmi,
fill = age_class
) +
geom_density()

​The first steps of building a graph Formation R - https://thinkr.fr 128 / 470


Quiz
data_plot_formative

#> # A tibble: 3 × 5
#> amount_sugar amount_vit_c amount_water time location
#> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 9.58 24.7 88.2 Lunch home
#> 2 0 0 0 Lunch home
#> 3 0.57 19 48.7 Lunch home

ggplot(data = data_plot_formative) ggplot(data = data_pour_le_graph) +


aes(x = amount_sugar, y = time) aes(x = amount_sugar, y = time) +
geom_boxplot() geom_boxplot()

ggplot(data = data_plot_formative) + ggplot(data = data_plot_formative) +


aes(x = amount_peanuts, y = time) + aes(x = amount_sugar, y = time) +
geom_boxplot() geom_boxplot()

​The first steps of building a graph Formation R - https://thinkr.fr 129 / 470


Quiz
aes()

aes(x = amount_sugar, y = amount_vit_c, size = amount_water, color =


time)

aes(x = amount_sugar, y = amount_vit_c, size = amount_water, fill = time)

aes(x = amount_sugar, y = amount_vit_c, shape = amount_water, time)

​The first steps of building a graph Formation R - https://thinkr.fr 130 / 470


Quiz

ggplot(data) + aes(x = time, y = amount_water) + geom_violin()

ggplot(data) + aes(x = time, y = amount_sugar) + geom_violin()

ggplot(data) + aes(x = time, y = amount_sugar) + geom_boxplot()

​The first steps of building a graph Formation R - https://thinkr.fr 131 / 470



Good programming practices

​Good programming practices Formation R - https://thinkr.fr 133 / 470


Good programming practices

as.numeric View

​Good programming practices Formation R - https://thinkr.fr 134 / 470


Good programming practices

icannotreadthistext,ithurtsmyeyes,don'tyouthink?

a<-1
# a <- 1
# a < -1

resultat <- mean(1:10 + 26, na.rm = TRUE)

resultat=mean(1:10+26,na.rm=T)

​Good programming practices Formation R - https://thinkr.fr 135 / 470



Characteristics of R objects

class

class(1) class("mummy")

#> [1] "numeric" #> [1] "character"

class(TRUE)

#> [1] "logical"

​Objects Formation R - https://thinkr.fr 137 / 470


Characteristics of R objects

dessin <- ggplot(data = iris) +


aes(x = Sepal.Length, y = Petal.Length) +
geom_point(color = "green")
class(dessin)

#> [1] "gg" "ggplot"

class(class)

#> [1] "function"

​Objects Formation R - https://thinkr.fr 138 / 470


Object types

​Objects Formation R - https://thinkr.fr 139 / 470


Characteristics of vectors

c()

x <- c(1, 2, 3, 4)
x2 <- c("dad", "mom")
x3 <- c(TRUE, FALSE)

x4 <- c(1, "dad", TRUE)


x4
class(x4)

​Objects Formation R - https://thinkr.fr 140 / 470


Characteristics of vectors

1:10

#> [1] 1 2 3 4 5 6 7 8 9 10

seq.int()

seq.int(from = 1, to = 30, by = 2)

#> [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

​Objects Formation R - https://thinkr.fr 141 / 470


Operators

​Objects Formation R - https://thinkr.fr 142 / 470


Operators

height <- c(1.30, 1.55, 1.20, 1.83, 1.67)


height == 1.30

#> [1] TRUE FALSE FALSE FALSE FALSE

height + 1

#> [1] 2.30 2.55 2.20 2.83 2.67

test_height <- height > 1.6


test_height

#> [1] FALSE FALSE FALSE TRUE TRUE

!test_height

#> [1] TRUE TRUE TRUE FALSE FALSE

height != 1.20

#> [1] TRUE TRUE FALSE TRUE TRUE

​Objects Formation R - https://thinkr.fr 143 / 470


Quiz

vect_1 <- 1:5


vect_2 <- vect_1 + 1

vect_1 vect_2

vect_2 vect_2

​Objects Formation R - https://thinkr.fr 144 / 470


Quiz

vect_1 <- c(1, 2, 3, 4, 5)


vect_2 <- vect_1 > 2

class(vect_2) logical vect_2 TRUE

vect_2 !vect_2 TRUE

​Objects Formation R - https://thinkr.fr 145 / 470


Missing values
NA

is.na()

height <- c(1.30, 1.55, NA, 1.83, 1.67)


is.na(height)

#> [1] FALSE FALSE TRUE FALSE FALSE

!is.na(height)

#> [1] TRUE TRUE FALSE TRUE TRUE

height > 1.6

#> [1] FALSE FALSE NA TRUE TRUE

height > 1.6 & !is.na(height)

#> [1] FALSE FALSE FALSE TRUE TRUE

​Objects Formation R - https://thinkr.fr 146 / 470


Quiz

vect_3 <- c(1, 2, NA, 4, NA)


resultat <- is.na(vect_3)

!resultat resultat TRUE

resultat FALSE FALSE TRUE !resultat TRUE


FALSE TRUE FALSE

​Objects Formation R - https://thinkr.fr 147 / 470


Type conversion rules

​Objects Formation R - https://thinkr.fr 148 / 470


Type conversion
height <- c(1.30, 1.55, NA, 1.83, 1.67)

as.character() character

as.character(height)

#> [1] "1.3" "1.55" NA "1.83" "1.67"

as.numeric() numeric

height <- c(1.30, 1.55, NA, 1.83, 1.67)


not_missing <- !is.na(height)
as.numeric(not_missing)

#> [1] 1 1 0 1 1

​Objects Formation R - https://thinkr.fr 149 / 470


Vector-related functions
length sum min max mean median

a <- c(3, -5, 9, 6)

length(a)
sum(a)
min(a)
max(a)
mean(a)
median(a)

​Objects Formation R - https://thinkr.fr 150 / 470


Vector-related functions
height <- c(1.30, 1.55, NA, 1.83, 1.67)

na.rm = TRUE

sum(height)

#> [1] NA

sum(height, na.rm = TRUE)

#> [1] 6.35

length(height)
min(height, na.rm = TRUE)
max(height, na.rm = TRUE)
mean(height, na.rm = TRUE)
median(height, na.rm = TRUE)

​Objects Formation R - https://thinkr.fr 151 / 470


Counting missing values in a vector

height <- c(1.30, 1.55, NA, 1.83, NA)

NA
TRUE
numeric TRUE 1 FALSE 0

sum(as.numeric(is.na(height)))

#> [1] 2

sum(is.na(height)) # because R sometimes nice

#> [1] 2

​Objects Formation R - https://thinkr.fr 152 / 470


Quiz

height <- c(1.30, 1.55, NA, 1.83, NA)


count <- sum(is.na(height))
result <- mean(is.na(height))

result result

result result

​Objects Formation R - https://thinkr.fr 153 / 470



What's the {tidyverse}?

Formation R - https://thinkr.fr 155 / 470


The {tidyverse} it's also...

Formation R - https://thinkr.fr 156 / 470


The usefulness of the tidyverse packages

Formation R - https://thinkr.fr 157 / 470


The {tidyverse} packages
library(tidyverse)

#> ── Attaching packages ─────────────────────────────────────── tidyverse


1.3.1 ──

#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4


#> ✓ tibble 3.1.6 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.4 ✓ stringr 1.4.0
#> ✓ readr 2.1.1 ✓ forcats 0.5.1

#> ── Conflicts ──────────────────────────────────────────


tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()

Formation R - https://thinkr.fr 158 / 470


Datasets in the {tidyverse}

Formation R - https://thinkr.fr 159 / 470


An example with iris

Formation R - https://thinkr.fr 160 / 470


An example with iris

iris

#> # A tibble: 150 × 5


#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # … with 140 more rows

Formation R - https://thinkr.fr 161 / 470



Files

​A file ? Formation R - https://thinkr.fr 163 / 470


Extensions

a_photo.jpg
a_message.txt
a_music.mp3

​A file ? Formation R - https://thinkr.fr 164 / 470


Tidying up a bit

/ /

/ /

​A file ? Formation R - https://thinkr.fr 165 / 470


That's all relative

C:/a_file/a_sub_file/pictures/a_pictures.jpg

pictures/a_pictures.jpg

​A file ? Formation R - https://thinkr.fr 166 / 470


A little birdie told me

​A file ? Formation R - https://thinkr.fr 167 / 470


Memento

​A file ? Formation R - https://thinkr.fr 168 / 470


Quiz
C:\project\study
analysis.Rmd

"C:/project/study/analyse.Rmd"

"C:/project/study/analyse"

"analysis.Rmd"

"analysis"

​A file ? Formation R - https://thinkr.fr 169 / 470


Quiz

​A file ? Formation R - https://thinkr.fr 170 / 470



Import .xls ou .xlsx Excel files

library(readxl)
conso_complement_alimentaire <- read_excel(path = "data/conso_ca_prod.xlsx")
conso_complement_alimentaire

#> # A tibble: 37 × 12
#> POPULATION NOIND periode_reference num_ligne_CA num_prod type_prod
#> <chr> <dbl> <chr> <dbl> <dbl> <chr>
#> 1 Pop1 Individu 119403801 12 mois 5711 1 Complément a…
#> 2 Pop1 Individu 121303701 12 mois 6351 1 Médicament
#> 3 Pop1 Individu 123200801 12 mois 7401 1 Complément a…
#> 4 Pop1 Individu 127510001 1 mois 10641 1 Médicament
#> 5 Pop1 Individu 212503101 12 mois 12731 1 Médicament
#> 6 Pop1 Individu 213102601 12 mois 13061 1 Complément a…
#> 7 Pop1 Individu 213102601 12 mois 13062 2 Non identifié
#> 8 Pop1 Individu 213102601 12 mois 13063 3 Complément a…
#> 9 Pop1 Individu 219400101 12 mois 17491 1 Complément a…
#> 10 Pop1 Individu 219400101 12 mois 17492 2 Médicament
#> # … with 27 more rows, and 6 more variables: classif_reg_prod <chr>,
#> # classif_prod <chr>, pres_prod <chr>, nb_unit_prod <chr>,
#> # mode_conso_prod <chr>, nb_jours_an <dbl>
​Import a xls/xlsx file with {readxl} Formation R - https://thinkr.fr 172 / 470
Quiz

read_excel(path = "data/consumption.xlsx")

read_excel(path = "data/consumption.csv")

read_csv(path = "data/consumption.xls")

read_sas(path = "data/consumption.xlsx")

​Import a xls/xlsx file with {readxl} Formation R - https://thinkr.fr 173 / 470



Import data from "flat" files

, ;

​Import a flat file with {readr} Formation R - https://thinkr.fr 175 / 470


The read_csv and read_csv2 functions

read_csv read_csv2

library(readr)
product <- read_csv(file = "data/conso_ca_prod.csv") # comma
indiv <- read_csv2(file = "data/conso_ca_indiv.csv") # semicolon

dim(indiv) # number of row and columns in the data.frame


names(indiv) # names of the variables in the data.frame
head(indiv) # first 6 lines of the data.frame
dplyr::glimpse(indiv) # condensed visualisation of the data
skimr::skim(indiv) # descriptive stats summary

​Import a flat file with {readr} Formation R - https://thinkr.fr 176 / 470


data_habits_indiv



Import a dataset with the import button

​Import files with the GUI Formation R - https://thinkr.fr 179 / 470


Import a dataset with the import button

​Import files with the GUI Formation R - https://thinkr.fr 180 / 470



Control data file import

dim(dataset) # number of row and columns in the data.frame


names(dataset) # names of the variables in the data.frame
head(dataset) # first 6 lines of the data.frame
dplyr::glimpse(dataset) # condensed visualisation of the data
skimr::skim(dataset) # descriptive stats summary

​Control data import in R Formation R - https://thinkr.fr 182 / 470


Quiz

# A tibble: 100 x 1
`POPULATION;NOIND;periode_reference;conso_ca;conso_ca_regl;co~
<chr>
1 Pop1 Individu;110100101;12 mois;Non;Non;NA;NA;NA;NA;NA;NA;NA;~
2 Pop1 Individu;113307301;12 mois;Non;Non;NA;NA;NA;NA;NA;NA;NA;~
3 Pop1 Individu;114902101;12 mois;Non;Non;NA;NA;NA;NA;NA;NA;NA;~

All good

You picked the wrong column separator

You used the wrong import function

Answer D

​Control data import in R Formation R - https://thinkr.fr 183 / 470


Quiz

# A tibble: 86 x 1
`PK\003\004\024`
<chr>
1 "\xa1\xa6"
2 "B\xa8\x10\xaaf\x91\x97\xed9\xe7\xcc\xf1d<\xbb\\9[=AB\x13|#\x8e\xa~
3 "\xcc"

All good

You picked the wrong column separator

You used the wrong import function

Answer D

​Control data import in R Formation R - https://thinkr.fr 184 / 470



Modify aes() default scale parameters
aes()
scale_*()

fill
scale_fill()

​Graphs: change default variable display Formation R - https://thinkr.fr 186 / 470


Modify aes() default scale parameters

#> scale_fill_viridis_b
#> scale_color_continuous
#> scale_color_gradient2
color
#> scale_fill_gradient
#> scale_colour_continuous
scale_color_*()
#> scale_colour_viridis_d
#> scale_color_viridis_b
fill scale_fill_*()
#> scale_color_viridis_c
#> scale_discrete_manual
#> scale_colour_manual
#> scale_colour_viridis_c
#> scale_size_continuous
#> scale_shape_manual
#> scale_fill_viridis_d
#> scale_alpha_manual
#> scale_fill_viridis_c
#> scale_fill_gradient2
#> scale_fill_continuous

​Graphs: change default variable display Formation R - https://thinkr.fr 187 / 470


Modify aes() default scale parameters

scale_color/fill_grey() scale_color/fill_manual()

scale_color/fill_viridis_d()

​Graphs: change default variable display Formation R - https://thinkr.fr 188 / 470


Modify aes() default scale parameters

scale_color_manual(values = c("coolor1", "color2", ..., "colorN"))

scale_color_manual(values = c("pink", "red", ..., "blue"))

scale_color_manual(values = c("#E697DD", "#F51414", ..., "#142DF5"))

​Graphs: change default variable display Formation R - https://thinkr.fr 189 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density()

​Graphs: change default variable display Formation R - https://thinkr.fr 190 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density() +
scale_fill_grey()

​Graphs: change default variable display Formation R - https://thinkr.fr 191 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point()

​Graphs: change default variable display Formation R - https://thinkr.fr 192 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(
values = c("#20B8D6", "#FF9300",
"#7176B8")
)

​Graphs: change default variable display Formation R - https://thinkr.fr 193 / 470


Quiz

scale_color_viridis_d()

scale_size_viridis_d()

scale_fill_viridis_d()

​Graphs: change default variable display Formation R - https://thinkr.fr 194 / 470



Personalize geometric objects
geom_*()

color
fill
alpha
size

ggplot(data) +
aes(...) +
geom_...(color = ...)

​More geometric objects ! Formation R - https://thinkr.fr 196 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density() +
scale_fill_grey()

​More geometric objects ! Formation R - https://thinkr.fr 197 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey()

​More geometric objects ! Formation R - https://thinkr.fr 198 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey()

alpha
aes()
geom_density()

​More geometric objects ! Formation R - https://thinkr.fr 199 / 470


Combining geometric objects

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = age_class) + aes(x = bmi, y = age_class) +
geom_boxplot() geom_boxplot() +
geom_point()

​More geometric objects ! Formation R - https://thinkr.fr 200 / 470


Combining geometric objects
aes()

ggplot(data = data_plot_target) +
aes(x = bmi, y = age_class, color = age_class) +
geom_boxplot() +
geom_point()

​More geometric objects ! Formation R - https://thinkr.fr 201 / 470


Combiner les objets géométriques
aes()
aes() geom_*()

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = age_class) + aes(x = bmi, y = age_class) +
geom_boxplot(aes(color = age_class)) + geom_boxplot() +
geom_point() geom_point(aes(color = age_class))

​More geometric objects ! Formation R - https://thinkr.fr 202 / 470


Frequent mistake

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = consumed_quantity) + aes(x = bmi, y = consumed_quantity) +
geom_point(color = "green") geom_point(aes(color = "green"))

​More geometric objects ! Formation R - https://thinkr.fr 203 / 470


Quiz

geom_boxplot(color = "red") + geom_point(size = 2)

geom_boxplot(aes(color = "red")) + geom_point(size = 2)

geom_boxplot(color = "red") + geom_point(aes(size = amount_water))

​More geometric objects ! Formation R - https://thinkr.fr 204 / 470



The facet_grid() function

facet_grid()

rows = vars(...)
cols = vars(...)

vars()

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 206 / 470


Builiding example plot

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 207 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8"))

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 208 / 470


Builiding example plot

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = consumed_quantity, aes(x = bmi, y = consumed_quantity,
color = food_type) + color = food_type) +
geom_point() + geom_point() +
scale_color_manual(values = scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class))

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 209 / 470


Builiding example plot

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = consumed_quantity, aes(x = bmi, y = consumed_quantity,
color = food_type) + color = food_type) +
geom_point() + geom_point() +
scale_color_manual(values = scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(rows = vars(age_class))

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 210 / 470


Quiz
facet_grid

facet_grid(rows = vars(time, location))

facet_grid(rows = vars(time))

facet_grid(rows = vars(time), cols = vars(location))

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 211 / 470


Hands-on Practical

​Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 212 / 470



The labs() function

labs()

title

subtitle

color fill size alpha


aes()

caption

​Modify title labels Formation R - https://thinkr.fr 214 / 470


Builiding example plot

​Modify title labels Formation R - https://thinkr.fr 215 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey()

​Modify title labels Formation R - https://thinkr.fr 216 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey() +
labs(
title = "Children BMI by age class",
x = "BMI",
y = "Density",
fill = "Age class"
)

​Modify title labels Formation R - https://thinkr.fr 217 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8"))

​Modify title labels Formation R - https://thinkr.fr 218 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(
title = "Consumption of fat/sweet
foods according to children BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type"
)

\n y = "..."

​Modify title labels Formation R - https://thinkr.fr 219 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class))

​Modify title labels Formation R - https://thinkr.fr 220 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class)) +
labs(
title = "",
subtitle = "By age class",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type",
caption = "NB: BMI (Body Mass Index)
= weigth / (height ^ 2)"
)

​Modify title labels Formation R - https://thinkr.fr 221 / 470


Quiz
labs()

labs(title = "Sugars versus Vitamin C", color = "Reference: INCA3 study")

labs(subtitle = "Sugars versus Vitamin C", caption = "Reference: INCA3


study")

labs(title = "Sugars versus Vitamin C", caption = "Reference: INCA3


study")

​Modify title labels Formation R - https://thinkr.fr 222 / 470


Practical

​Modify title labels Formation R - https://thinkr.fr 223 / 470



The coord_flip() function

​Play with the coordinate system to flip your grah upside down Formation R - https://thinkr.fr 225 / 470
The coord_flip() function

ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +


aes(x = bmi, y = age_class) + aes(x = bmi, y = age_class) +
geom_boxplot() geom_boxplot() +
coord_flip()

​Play with the coordinate system to flip your grah upside down Formation R - https://thinkr.fr 226 / 470

The theme_*() functions

#> [1] "theme_bw" "theme_classic" "theme_dark" "theme_gray"


#> [5] "theme_grey" "theme_light" "theme_linedraw" "theme_minimal"
#> [9] "theme_test" "theme_void"

​Customize the graph theme Formation R - https://thinkr.fr 228 / 470


The theme_*() functions

#> [1] "theme_base" "theme_calc" "theme_clean"


#> [4] "theme_economist" "theme_economist_white" "theme_excel"
#> [7] "theme_excel_new" "theme_few" "theme_fivethirtyeight"
#> [10] "theme_foundation" "theme_gdocs" "theme_hc"
#> [13] "theme_igray" "theme_map" "theme_pander"
#> [16] "theme_par" "theme_solarized" "theme_solarized_2"
#> [19] "theme_solid" "theme_stata" "theme_tufte"
#> [22] "theme_wsj"

​Customize the graph theme Formation R - https://thinkr.fr 229 / 470


Building example graph

​Customize the graph theme Formation R - https://thinkr.fr 230 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey() +
labs(title = "Children BMI by age
class",
x = "BMI",
y = "Density",
fill = "Age class")

​Customize the graph theme Formation R - https://thinkr.fr 231 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, fill = age_class) +
geom_density(alpha = 0.8) +
scale_fill_grey() +
labs(title = "Children BMI by age
class",
x = "BMI",
y = "Density",
fill = "Age class") +
theme_few() # du package {ggthemes}

​Customize the graph theme Formation R - https://thinkr.fr 232 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type")

​Customize the graph theme Formation R - https://thinkr.fr 233 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type")+
theme_few()

​Customize the graph theme Formation R - https://thinkr.fr 234 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class)) +
labs(
title = "",
subtitle = "By age class",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type",
caption = "NB: BMI (Body Mass Index)
= weigth / (height ^ 2)"
) +
theme_few()

​Customize the graph theme Formation R - https://thinkr.fr 235 / 470


Bonus: towards a finer customization
theme()

theme_*()

theme(legend.position = "bottom")

guides(color = guide_legend(ncol = 1))

​Customize the graph theme Formation R - https://thinkr.fr 236 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type") +
theme_few()

​Customize the graph theme Formation R - https://thinkr.fr 237 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type") +
theme_few() +
theme(legend.position = "bottom")

​Customize the graph theme Formation R - https://thinkr.fr 238 / 470


Builiding example plot

ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type") +
theme_few() +
theme(legend.position = "bottom") +
guides(color = guide_legend(ncol = 1))

​Customize the graph theme Formation R - https://thinkr.fr 239 / 470



The ggsave() function

ggsave(filename = ..., plot = ...)

filename

plot

​Export a graph Formation R - https://thinkr.fr 241 / 470


Assignation

plot_alim_bmi <- ggplot(data = data_plot_target) +


aes(x = bmi, y = consumed_quantity, color = food_type) +
geom_point() +
scale_color_manual(values = c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Consumption of fat/sweet foods according to children BMI",
x = "BMI",
y = "Average consumption by\nchild during study (in g)",
color = "Food type") +
theme_few() +
theme(legend.position = "bottom") +
guides(color = guide_legend(ncol = 1))

plot_alim_bmi

​Export a graph Formation R - https://thinkr.fr 242 / 470


Assignation

plot_alim_bmi

​Export a graph Formation R - https://thinkr.fr 243 / 470


Export

ggsave(filename = "graph_alim_bmi.png", plot = plot_alim_bmi)

​Export a graph Formation R - https://thinkr.fr 244 / 470


Hands-on Practical

​Export a graph Formation R - https://thinkr.fr 245 / 470



{dplyr} What is it?

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 247 / 470
Manipulate a data.frame
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

occasion_type
food
amount

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 248 / 470
Manipulate a data.frame
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 249 / 470
Manipulate a data.frame
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

data_food

occasion_type "Lunch"
amount_kg amount
food
quantite_consommee_kg

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 250 / 470
Chain operations in {dplyr}

%>%

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 251 / 470
Chain operations in {dplyr}
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

# in pseudo code
data_food %>%
filter_lunch %>%
create_column_amount_kg %>%
group_by_food %>%
create_mean_amount_kg

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 252 / 470
Exercise
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

​Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 253 / 470

Explore rows of a dataset
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

Formation R - https://thinkr.fr 255 / 470


Explore rows of a dataset
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

"Lunch" occasion_type
food

Formation R - https://thinkr.fr 256 / 470


The {dplyr} functions handle rows
arrange()

filter()

count()

Formation R - https://thinkr.fr 257 / 470


Rearrange rows with arrange()

Formation R - https://thinkr.fr 258 / 470


Rearrange rows with arrange()

desc()

your_dataframe %>%
arrange(sorting_variable_1, sorting_variable_2)

Formation R - https://thinkr.fr 259 / 470


Rearrange rows with arrange()
data_food %>%
arrange(amount)

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Breakfast stevia 0.025
#> 2 In the morning aspartame 0.0300
#> 3 Lunch aspartame 0.0300
#> 4 Snack aspartame 0.0300
#> 5 Dinner aspartame 0.0300
#> 6 In the evening/night aspartame 0.0300
#> 7 In the morning aspartame 0.0300
#> 8 Lunch aspartame 0.0300
#> 9 Snack aspartame 0.0300
#> 10 In the evening/night aspartame 0.0300
#> 11 Dinner olive oil 0.0448
#> 12 Lunch olive oil 0.048
#> 13 Lunch olive oil 0.0500
#> 14 Lunch stevia 0.0500
#> 15 Snack stevia 0.0500

Formation R - https://thinkr.fr 260 / 470


Rearrange rows with arrange()
data_food %>%
arrange(desc(amount))

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Dinner blond beer -2% 1768.
#> 2 Aperitif before dinner beer with peach sirup 1496.
#> 3 Lunch meat soup 1481.
#> 4 Lunch pistou soup 1481.
#> 5 Lunch meat stock 1467.
#> 6 Dinner meat stock 1467.
#> 7 Dinner croque madame 1443.
#> 8 In the afternoon (excluding snacks) fruit punch 1354.
#> 9 Lunch chinese soup 1185.
#> 10 Dinner meat stock 1173.
#> 11 Dinner stew stock 1173.
#> 12 Dinner vegetable stock 1167.
#> 13 Dinner fajita 1100
#> 14 Dinner vegetable soup 1050.
#> 15 Dinner diluted fruit juice 1024.

Formation R - https://thinkr.fr 261 / 470


Rearrange rows with arrange()
data_food %>%
arrange(occasion_type, desc(amount))

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before dinner beer with peach sirup 1496.
#> 2 Aperitif before dinner wine-based cocktail 1014.
#> 3 Aperitif before dinner blond beer 2-4.9% 1010
#> 4 Aperitif before dinner n.s. blond beer 1010
#> 5 Aperitif before dinner blond beer 2-4.9% 1010
#> 6 Aperitif before dinner non-aromatised still water 1000
#> 7 Aperitif before dinner tap water 1000
#> 8 Aperitif before dinner tap water 1000
#> 9 Aperitif before dinner tap water 1000
#> 10 Aperitif before dinner tap water 1000
#> 11 Aperitif before dinner n.s. still water 1000
#> 12 Aperitif before dinner non-aromatised still water 1000
#> 13 Aperitif before dinner fruit punch 903.
#> 14 Aperitif before dinner blond beer 5-7.9% 864.
#> 15 Aperitif before dinner non-aromatised still water 855

Formation R - https://thinkr.fr 262 / 470


Filter rows with filter()

Formation R - https://thinkr.fr 263 / 470


Filter rows with filter()

> < <= >=


%in% & | !

your_dataframe %>%
filter(condition)

Formation R - https://thinkr.fr 264 / 470


Filter rows with filter()
# "Aperitif before lunch" in the occasion_type column
data_food %>%
filter(occasion_type == "Aperitif before lunch")

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Aperitif before lunch tap water 333
#> 3 Aperitif before lunch soda with lemoin extract like sprite 178.
#> 4 Aperitif before lunch non-aromatised still water 125
#> 5 Aperitif before lunch tap water 142.
#> 6 Aperitif before lunch tap water 105
#> 7 Aperitif before lunch tap water 120
#> 8 Aperitif before lunch tap water 221.
#> 9 Aperitif before lunch champagne brut 135
#> 10 Aperitif before lunch tap water 258.
#> 11 Aperitif before lunch grilled peanut 100
#> 12 Aperitif before lunch olive n.s. 18
#> 13 Aperitif before lunch pastis 285
#> 14 Aperitif before lunch potato chips 24
#> 15 Aperitif before lunch green olive 12

Formation R - https://thinkr.fr 265 / 470


Filter rows with filter()
# "Dinner" in occasion_type column AND "tap water" in food column
data_food %>%
filter(occasion_type == "Dinner" & food == "tap water")

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Dinner tap water 221.
#> 2 Dinner tap water 221.
#> 3 Dinner tap water 258.
#> 4 Dinner tap water 210
#> 5 Dinner tap water 210
#> 6 Dinner tap water 105
#> 7 Dinner tap water 120
#> 8 Dinner tap water 50
#> 9 Dinner tap water 62.5
#> 10 Dinner tap water 140
#> 11 Dinner tap water 267.
#> 12 Dinner tap water 315
#> 13 Dinner tap water 140
#> 14 Dinner tap water 120
#> 15 Dinner tap water 120

Formation R - https://thinkr.fr 266 / 470


Filter rows with filter()
# "Dinner" in occasion_type column OR "tap water" in food column
data_food %>%
filter(occasion_type == "Dinner" | food == "tap water")

#> # A tibble: 15 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Dinner fruit yoghurt 125
#> 3 Dinner tomato sauce 102.
#> 4 Dinner dow 300
#> 5 Dinner white bread 29.4
#> 6 Dinner tap water 221.
#> 7 In the evening/night tap water 148.
#> 8 In the evening/night tap water 148.
#> 9 Lunch tap water 360
#> 10 Dinner red cabbage 65
#> 11 Dinner white salt 1
#> 12 Dinner beef bifteck 153
#> 13 Dinner salad dressing with wine vinegar 4.28
#> 14 Dinner green beans 30
#> 15 Dinner white bread 31.5

Formation R - https://thinkr.fr 267 / 470


Filter rows with filter()
%in% |

data_food %>%
filter(occasion_type %in% c("Lunch", "Dinner"))

data_food %>%
filter(occasion_type == "Lunch" | occasion_type == "Dinner")

Formation R - https://thinkr.fr 268 / 470


Filter rows with filter()

data_food %>%
filter(occasion_type == "Lunch") %>%
arrange(desc(amount)) %>%
head()

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Lunch meat soup 1481.
#> 2 Lunch pistou soup 1481.
#> 3 Lunch meat stock 1467.
#> 4 Lunch chinese soup 1185.
#> 5 Lunch non-aromatised still water 1000
#> 6 Lunch tap water 1000

Formation R - https://thinkr.fr 269 / 470


Quiz
occasion_type
data_food

data_food %>% data_food %>%


filter(occasion_type != NA) filter(is.na(occasion_type) = FALSE)

data_food %>% data_food %>%


arrange(desc(occasion_type)) filter(!is.na(occasion_type))

Formation R - https://thinkr.fr 270 / 470


A few particular filters: distinct()
unique()

data_food %>%
distinct()

#> # A tibble: 68,041 × 3


#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8
#> 7 Lunch hamburger 106
#> 8 In the afternoon (excluding snacks) chewing gum 1.4
#> 9 In the afternoon (excluding snacks) water with mint sirup 447.
#> 10 Dinner fruit yoghurt 125
#> # … with 68,031 more rows

Formation R - https://thinkr.fr 271 / 470


A few particular filters: distinct()
unique()

data_food %>%
distinct()

distinct()

occasion_type food amount

Formation R - https://thinkr.fr 272 / 470


A few particular filters: distinct()

data_food %>%
distinct(occasion_type)

#> # A tibble: 10 × 1
#> occasion_type
#> <chr>
#> 1 Aperitif before lunch
#> 2 Lunch
#> 3 In the afternoon (excluding snacks)
#> 4 Dinner
#> 5 In the evening/night
#> 6 Breakfast
#> 7 Aperitif before dinner
#> 8 In the morning
#> 9 Snack
#> 10 Before breakfast

Formation R - https://thinkr.fr 273 / 470


A few particular filters: slice_sample() ...

data_food %>%
slice_sample(n = 10)

#> # A tibble: 10 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Dinner n.s. cooking fat 5
#> 2 Dinner yaourt avec fruits 125
#> 3 Breakfast tartine craquante au froment (classique) type cracotte 42
#> 4 Lunch compote (de fruits) 90
#> 5 Lunch cuisse de canard 122.
#> 6 Lunch vin rouge 120
#> 7 Dinner salade batavia 15
#> 8 Dinner non-aromatised still water 162.
#> 9 Dinner nem au porc 30
#> 10 Dinner tap water 157

Formation R - https://thinkr.fr 274 / 470


... and slice_sample(prop = ...)

data_food %>%
slice_sample(prop = 0.05) # sample 5% of all rows

#> # A tibble: 12,815 × 3


#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Dinner huile de pépins de raisin 13.1
#> 2 Lunch sucre blanc 3
#> 3 Lunch non-aromatised still water 402.
#> 4 Snack eau minérale plate n.s. 80
#> 5 Dinner unsalted butter 5.5
#> 6 Dinner sel marin gris type noirmoutier/guérande 1
#> 7 Snack gâteau moelleux 30
#> 8 Dinner tomato sauce 15.9
#> 9 Lunch jambon cuit sans couenne 45
#> 10 Snack gaufrette fourrée aux fruits type paille d'or 43
#> # … with 12,805 more rows

Formation R - https://thinkr.fr 275 / 470


A few particular filters: slice_max()
n

data_food %>%
slice_max(amount, n = 2)

#> # A tibble: 2 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Dinner blond beer -2% 1768.
#> 2 Aperitif before dinner beer with peach sirup 1496.

Formation R - https://thinkr.fr 276 / 470


A few particular filters: slice_min()
n

data_food %>%
slice_min(amount, n = 2)

#> # A tibble: 10 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Breakfast stevia 0.025
#> 2 In the morning aspartame 0.0300
#> 3 Lunch aspartame 0.0300
#> 4 Snack aspartame 0.0300
#> 5 Dinner aspartame 0.0300
#> 6 In the evening/night aspartame 0.0300
#> 7 In the morning aspartame 0.0300
#> 8 Lunch aspartame 0.0300
#> 9 Snack aspartame 0.0300
#> 10 In the evening/night aspartame 0.0300

Formation R - https://thinkr.fr 277 / 470


Count rows with count()
data_food %>%
count()

#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 256301

count()

Formation R - https://thinkr.fr 278 / 470


Count rows with count()
data_food %>%
count(occasion_type)

#> # A tibble: 10 × 2
#> occasion_type n
#> <chr> <int>
#> 1 Aperitif before dinner 3967
#> 2 Aperitif before lunch 2181
#> 3 Before breakfast 2165
#> 4 Breakfast 40195
#> 5 Dinner 73760
#> 6 In the afternoon (excluding snacks) 12906
#> 7 In the evening/night 8501
#> 8 In the morning 10443
#> 9 Lunch 85188
#> 10 Snack 16995

Formation R - https://thinkr.fr 279 / 470


Count rows with count()
name

data_food %>%
count(occasion_type, name = "number")

#> # A tibble: 10 × 2
#> occasion_type number
#> <chr> <int>
#> 1 Aperitif before dinner 3967
#> 2 Aperitif before lunch 2181
#> 3 Before breakfast 2165
#> 4 Breakfast 40195
#> 5 Dinner 73760
#> 6 In the afternoon (excluding snacks) 12906
#> 7 In the evening/night 8501
#> 8 In the morning 10443
#> 9 Lunch 85188
#> 10 Snack 16995

Formation R - https://thinkr.fr 280 / 470


Count rows with count()

data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

Formation R - https://thinkr.fr 281 / 470


Count rows with count()

data_food %>%
filter(occasion_type == "Lunch") %>%
count(food, name = "number") %>%
arrange(desc(number)) %>%
head()

#> # A tibble: 6 × 2
#> food number
#> <chr> <int>
#> 1 tap water 6022
#> 2 white bread 4599
#> 3 non-aromatised still water 2597
#> 4 olive oil 1668
#> 5 dow 1462
#> 6 white salt 1355

Formation R - https://thinkr.fr 282 / 470



Manipulate the variables of a dataset
data_food

#> # A tibble: 6 × 5
#> occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aperitif before lunch Home water tap … 148.
#> 2 Lunch Home vegetable and fruit juice frui… 500
#> 3 Lunch Home animal fat n.s.… 3.22
#> 4 Lunch Home potatoes and other tubers pota… 107
#> 5 Lunch Home Meet based dish chic… 161
#> 6 Lunch Home plant-based fat n.s.… 12.8

occasion_type
occasion_location
food_type
food
amount

Formation R - https://thinkr.fr 284 / 470


Manipulate the variables of a dataset
data_food

#> # A tibble: 6 × 5
#> occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aperitif before lunch Home water tap … 148.
#> 2 Lunch Home vegetable and fruit juice frui… 500
#> 3 Lunch Home animal fat n.s.… 3.22
#> 4 Lunch Home potatoes and other tubers pota… 107
#> 5 Lunch Home Meet based dish chic… 161
#> 6 Lunch Home plant-based fat n.s.… 12.8

Formation R - https://thinkr.fr 285 / 470


The {dplyr} to manipulate columns
select()

mutate()

rename()

Formation R - https://thinkr.fr 286 / 470


Select columns with select()

Formation R - https://thinkr.fr 287 / 470


Select columns with select()

your_dataframe %>%
select(variable_to_keep_1, variable_to_keep_2, ...)

data_food %>%
select(food, amount)

#> # A tibble: 6 × 2
#> food amount
#> <chr> <dbl>
#> 1 tap water 148.
#> 2 fruit juice 100% pure juice 500
#> 3 n.s. cooking fat 3.22
#> 4 potato fries 107
#> 5 chicken nugget 161
#> 6 n.s. fat 12.8

Formation R - https://thinkr.fr 288 / 470


Select columns with select()

starts_with()

ends_with()

contains()

everything()

Formation R - https://thinkr.fr 289 / 470


Select columns with select()

data_food %>%
select(-occasion_type, -occasion_location)

#> # A tibble: 256,301 × 3


#> food_type food amount
#> <chr> <chr> <dbl>
#> 1 water tap water 148.
#> 2 vegetable and fruit juice fruit juice 100… 500
#> 3 animal fat n.s. cooking fat 3.22
#> 4 potatoes and other tubers potato fries 107
#> 5 Meet based dish chicken nugget 161
#> 6 plant-based fat n.s. fat 12.8
#> 7 Sandwiches, pizzas, pies, pastries and savory cookies hamburger 106
#> 8 Sweets and chocolate chewing gum 1.4
#> 9 Soft drinks water with mint… 447.
#> 10 Yoghurts and white cheeses fruit yoghurt 125
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 290 / 470


Select columns with select()

data_food %>%
select(starts_with("occasion"))

#> # A tibble: 256,301 × 2


#> occasion_type occasion_location
#> <chr> <chr>
#> 1 Aperitif before lunch Home
#> 2 Lunch Home
#> 3 Lunch Home
#> 4 Lunch Home
#> 5 Lunch Home
#> 6 Lunch Home
#> 7 Lunch Home
#> 8 In the afternoon (excluding snacks) Home
#> 9 In the afternoon (excluding snacks) Home
#> 10 Dinner Home
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 291 / 470


Select columns with select()

data_food %>%
select(-starts_with("occasion"))

#> # A tibble: 256,301 × 3


#> food_type food amount
#> <chr> <chr> <dbl>
#> 1 water tap water 148.
#> 2 vegetable and fruit juice fruit juice 100… 500
#> 3 animal fat n.s. cooking fat 3.22
#> 4 potatoes and other tubers potato fries 107
#> 5 Meet based dish chicken nugget 161
#> 6 plant-based fat n.s. fat 12.8
#> 7 Sandwiches, pizzas, pies, pastries and savory cookies hamburger 106
#> 8 Sweets and chocolate chewing gum 1.4
#> 9 Soft drinks water with mint… 447.
#> 10 Yoghurts and white cheeses fruit yoghurt 125
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 292 / 470


Select columns with select()

data_food %>%
select(-ends_with("type"))

#> # A tibble: 256,301 × 3


#> occasion_location food amount
#> <chr> <chr> <dbl>
#> 1 Home tap water 148.
#> 2 Home fruit juice 100% pure juice 500
#> 3 Home n.s. cooking fat 3.22
#> 4 Home potato fries 107
#> 5 Home chicken nugget 161
#> 6 Home n.s. fat 12.8
#> 7 Home hamburger 106
#> 8 Home chewing gum 1.4
#> 9 Home water with mint sirup 447.
#> 10 Home fruit yoghurt 125
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 293 / 470


Select columns with select()

data_food %>%
select(-contains("occasion"))

#> # A tibble: 256,301 × 3


#> food_type food amount
#> <chr> <chr> <dbl>
#> 1 water tap water 148.
#> 2 vegetable and fruit juice fruit juice 100… 500
#> 3 animal fat n.s. cooking fat 3.22
#> 4 potatoes and other tubers potato fries 107
#> 5 Meet based dish chicken nugget 161
#> 6 plant-based fat n.s. fat 12.8
#> 7 Sandwiches, pizzas, pies, pastries and savory cookies hamburger 106
#> 8 Sweets and chocolate chewing gum 1.4
#> 9 Soft drinks water with mint… 447.
#> 10 Yoghurts and white cheeses fruit yoghurt 125
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 294 / 470


Select columns with select()

data_food %>%
select(food, everything())

#> # A tibble: 256,301 × 5


#> food occasion_type occasion_locati… food_type amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 tap water Aperitif befo… Home water 148.
#> 2 fruit juice 100% pure juice Lunch Home vegetable… 500
#> 3 n.s. cooking fat Lunch Home animal fat 3.22
#> 4 potato fries Lunch Home potatoes … 107
#> 5 chicken nugget Lunch Home Meet base… 161
#> 6 n.s. fat Lunch Home plant-bas… 12.8
#> 7 hamburger Lunch Home Sandwiche… 106
#> 8 chewing gum In the aftern… Home Sweets an… 1.4
#> 9 water with mint sirup In the aftern… Home Soft drin… 447.
#> 10 fruit yoghurt Dinner Home Yoghurts … 125
#> # … with 256,291 more rows

Formation R - https://thinkr.fr 295 / 470


Quiz
occasion
data_food

data_food %>% data_food %>%


filter(starts_with("occasion")) select(starts_with("occasion"))

data_food %>%
starts_with("occasion")

Formation R - https://thinkr.fr 296 / 470


Quiz
#> # A tibble: 6 × 5
#> occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aperitif before lunch Home water tap … 148.
#> 2 Lunch Home vegetable and fruit juice frui… 500
#> 3 Lunch Home animal fat n.s.… 3.22
#> 4 Lunch Home potatoes and other tubers pota… 107
#> 5 Lunch Home Meet based dish chic… 161
#> 6 Lunch Home plant-based fat n.s.… 12.8

data_food %>%
select(food) %>%
select(amount)

data_food food

data_food food data_food amount


amount

Formation R - https://thinkr.fr 297 / 470


Transform or create a column with
mutate()

Formation R - https://thinkr.fr 298 / 470


mutate() to transform or create variables

your_dataframe %>%
mutate(new_variable_1 = operations(existing_variable_2),
new_variable_3 = operations(existing_variable_4),
...
)

Formation R - https://thinkr.fr 299 / 470


mutate() to transform or create variables

{dplyr}

lag() lead()

cumsum() cumprod()

+ - * > < <= >=

ifelse() case_when()

Formation R - https://thinkr.fr 300 / 470


mutate() to transform or create variables

tibble(
hour = 12:18,
food_intake = c(280, 25, 0, 0, 100, 50, 200)
) %>%
mutate(
lag_hour = lag(hour),
diff_intake = food_intake - lag(food_intake),
cum_intake = cumsum(food_intake)
)

#> # A tibble: 7 × 5
#> hour food_intake lag_hour diff_intake cum_intake
#> <int> <dbl> <int> <dbl> <dbl>
#> 1 12 280 NA NA 280
#> 2 13 25 12 -255 305
#> 3 14 0 13 -25 305
#> 4 15 0 14 0 305
#> 5 16 100 15 100 405
#> 6 17 50 16 -50 455
#> 7 18 200 17 150 655

Formation R - https://thinkr.fr 301 / 470


mutate() to transform or create variables

data_food %>%
mutate(amount_kg = amount / 1000,
amount_kg = round(amount_kg, digits = 2)) %>%
select(amount_kg, amount_kg) %>%
head()

#> # A tibble: 6 × 1
#> amount_kg
#> <dbl>
#> 1 0.15
#> 2 0.5
#> 3 0
#> 4 0.11
#> 5 0.16
#> 6 0.01

Formation R - https://thinkr.fr 302 / 470


mutate() to transform or create variables

case_when()

ifelse mutate case_when()

condition ~ result

your_dataframe %>%
mutate(
variable = case_when(
condition_1 ~ value_1,
condition_2 ~ value_2,
...
))

mutate() variable

Formation R - https://thinkr.fr 303 / 470


mutate() to transform or create variables

case_when()

data_food %>%
mutate(
amount_chr = case_when(
amount > 400 ~ "a gigantic quantity",
amount > 100 ~ "a lot",
amount >= 0 ~ "a small amount"
)
) %>%
select(amount, amount_chr)

#> # A tibble: 7 × 2
#> amount amount_chr
#> <dbl> <chr>
#> 1 148. a lot
#> 2 500 a gigantic quantity
#> 3 3.22 a small amount
#> 4 107 a lot
#> 5 161 a lot
#> 6 12.8 a small amount
#> 7 106 a lot

Formation R - https://thinkr.fr 304 / 470


Rename variables with rename()

your_dataframe %>%
rename(new_name = old_name)

data_food data_food %>%


rename(place = occasion_location)
#> # A tibble: 6 × 2
#> occasion_location amount #> # A tibble: 6 × 2
#> <chr> <dbl> #> place amount
#> 1 Home 148. #> <chr> <dbl>
#> 2 Home 500 #> 1 Home 148.
#> 3 Home 3.22 #> 2 Home 500
#> 4 Home 107 #> 3 Home 3.22
#> 5 Home 161 #> 4 Home 107
#> 6 Home 12.8 #> 5 Home 161
#> 6 Home 12.8

Formation R - https://thinkr.fr 305 / 470


Quiz
#> # A tibble: 4 × 5
#> occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aperitif before lunch Home water tap … 148.
#> 2 Lunch Home vegetable and fruit juice frui… 500
#> 3 Lunch Home animal fat n.s.… 3.22
#> 4 Lunch Home potatoes and other tubers pota… 107

amount quantity

data_food %>% data_food %>%


rename(amount = quantity) mutate(quantity = amount)

data_food %>% data_food %>%


select(quantity = amount) rename(quantity = amount)

Formation R - https://thinkr.fr 306 / 470


Extract the content of a column with
pull()

data_food %>%
select(amount) %>%
class()

#> [1] "tbl_df" "tbl" "data.frame"

data_food %>%
pull(amount) %>%
class()

#> [1] "numeric"

Formation R - https://thinkr.fr 307 / 470



Functions to summarise data

mean() median()
n() summarise()
var() sd()
min() max()

Formation R - https://thinkr.fr 309 / 470


Summarise data with summarise()

your_dataframe %>%
summarise(
var1_summary = function_1(variable_1),
var2_summary = function_2(variable_2),
...
)

Formation R - https://thinkr.fr 310 / 470


Summarise data with summarise()
data_food

#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8

Formation R - https://thinkr.fr 311 / 470


Summarise data with summarise()

data_food %>%
summarise(
mean_summary = mean(amount),
variance_summary = var(amount),
number_summary = n()
)

#> # A tibble: 1 × 3
#> mean_summary variance_summary number_summary
#> <dbl> <dbl> <int>
#> 1 NA NA 256301

Formation R - https://thinkr.fr 312 / 470


Summarise data with summarise()

data_food %>%
summarise(
mean_summary = mean(amount, na.rm = TRUE),
variance_summary = var(amount, na.rm = TRUE),
number_summary = n()
)

#> # A tibble: 1 × 3
#> mean_summary variance_summary number_summary
#> <dbl> <dbl> <int>
#> 1 118. 17515. 256301

Formation R - https://thinkr.fr 313 / 470


The adverbial complement group_by()

your_dataframe %>%
group_by(grouping_variable_1, grouping_variable_2, ...)

Formation R - https://thinkr.fr 314 / 470


Chain group_by() and summarise()

group_by() summarise()

Formation R - https://thinkr.fr 315 / 470


Chain group_by() and summarise()
data_food %>%
group_by(occasion_type) %>%
summarise(
mean_amount = mean(amount, na.rm = TRUE),
variance_amount = var(amount, na.rm = TRUE),
number = n(),
.groups = "drop"
)

#> # A tibble: 8 × 4
#> occasion_type mean_amount variance_amount number
#> <chr> <dbl> <dbl> <int>
#> 1 Aperitif before dinner 132. 20376. 3967
#> 2 Aperitif before lunch 121. 16260. 2181
#> 3 Before breakfast 159. 15585. 2165
#> 4 Breakfast 127. 22502. 40195
#> 5 Dinner 115. 18683. 73760
#> 6 In the afternoon (excluding snacks) 153. 18741. 12906
#> 7 In the evening/night 162. 18286. 8501
#> 8 In the morning 142. 16345. 10443

.groups = "drop"

Formation R - https://thinkr.fr 316 / 470


Exercise
data_physical_activity

#> # A tibble: 50 × 3
#> region gender time_physical_activity_hours
#> <chr> <chr> <dbl>
#> 1 Aquitaine autre 2.41
#> 2 Brittany M 4.40
#> 3 Normandy M 5.78
#> 4 Aquitaine F 0.353
#> 5 Burgondy M 1.86
#> 6 Burgondy autre 3.33
#> 7 Normandy F 1.23
#> 8 Brittany M 1.69
#> 9 Normandy M 5.04
#> 10 Brittany M 3.31
#> # … with 40 more rows

Formation R - https://thinkr.fr 317 / 470


Exercise

data_physical_activity %>%
filter(region == "Bretagne") %>%
select(-region) %>%
mutate(
time_physical_activity_hours = round(time_physical_activity_hours)
) %>%
group_by(gender) %>%
summarise(mean_time_phys_act_hours = mean(time_physical_activity_hours),
.groups = "drop") %>%
arrange(mean_time_phys_act_hours)

Formation R - https://thinkr.fr 318 / 470


Exercise

data_physical_activity %>%
group_by(gender) %>%
slice_sample(n = 10) %>%
mutate(time_physical_activity_minutes = time_physical_activity_hours * 60) %>%
slice_max(time_physical_activity_minutes, n = 3) %>%
summarise(
mean_time_h = mean(time_physical_activity_hours),
median_time_h = median(time_physical_activity_hours),
mean_time_min = mean(time_physical_activity_minutes),
median_time_min = median(time_physical_activity_minutes),
.groups = "drop"
)

Formation R - https://thinkr.fr 319 / 470



Tidy data

​Tidy data Formation R - https://thinkr.fr 321 / 470


Explanations

#> # A tibble: 8 × 2
#> age gender
#> <dbl> <chr>
#> 1 25 male
#> 2 45 male
#> 3 31 female
#> 4 10 male
#> 5 23 male
#> 6 43 male
#> 7 45 female
#> 8 12 male

​Tidy data Formation R - https://thinkr.fr 322 / 470


The statistical individual

The statistical individuals are lines while


variable are columns

​Tidy data Formation R - https://thinkr.fr 323 / 470


"Non-tidy" data versus "Tidy" data

#> # A tibble: 5 × 5
#> age_class water carbs lipids proteins
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 7-10 year 759. 390. 422. 275.
#> 2 11-17 year 942. 417. 435. 143.
#> 3 18-35 year 1026. 529. 537. 188.
#> 4 36-55 year 734. 635. 577. 151.
#> 5 56 year + 883. 590. 368. 258.

​Tidy data Formation R - https://thinkr.fr 324 / 470


"Non-tidy" data versus "Tidy" data

#> # A tibble: 5 × 5 #> # A tibble: 20 × 3


#> age_class water carbs lipids proteins #> age_class nutrient avg_intake
#> <chr> <dbl> <dbl> <dbl> <dbl> #> <chr> <chr> <dbl>
#> 1 7-10 year 759. 390. 422. 275. #> 1 7-10 year water 759.
#> 2 11-17 year 942. 417. 435. 143. #> 2 7-10 year carbs 390.
#> 3 18-35 year 1026. 529. 537. 188. #> 3 7-10 year lipids 422.
#> 4 36-55 year 734. 635. 577. 151. #> 4 7-10 year proteins 275.
#> 5 56 year + 883. 590. 368. 258. #> 5 11-17 year water 942.
#> 6 11-17 year carbs 417.
#> 7 11-17 year lipids 435.
#> 8 11-17 year proteins 143.
#> 9 18-35 year water 1026.
#> 10 18-35 year carbs 529.
#> 11 18-35 year lipids 537.
#> 12 18-35 year proteins 188.
#> 13 36-55 year water 734.
#> 14 36-55 year carbs 635.
#> 15 36-55 year lipids 577.
#> 16 36-55 year proteins 151.
#> 17 56 year + water 883.
#> 18 56 year + carbs 590.
#> 19 56 year + lipids 368.
#> 20 56 year + proteins 258.

​Tidy data Formation R - https://thinkr.fr 325 / 470


Quiz

data_a data_b data_c

#> # A tibble: 4 × 2 #> # A tibble: 4 × 3 #> # A tibble: 4 × 4


#> id information #> id gender age #> id man woman age
#> <int> <chr> #> <int> <chr> <chr> #> <int> <dbl> <dbl> <chr>
#> 1 1 34 years old man #> 1 1 man 34 #> 1 1 1 0 34
#> 2 2 23 years old woman #> 2 2 woman 23 #> 2 2 0 1 23
#> 3 3 12 years old man #> 3 3 man 12 #> 3 3 1 0 12
#> 4 4 13 years old woman #> 4 4 woman 13 #> 4 4 0 1 13

data_a

data_b

data_c

​Tidy data Formation R - https://thinkr.fr 326 / 470


Exercise

​Tidy data Formation R - https://thinkr.fr 327 / 470


Exercise

​Tidy data Formation R - https://thinkr.fr 328 / 470


Exercise

​Tidy data Formation R - https://thinkr.fr 329 / 470



The one-million dollar question

​Of the importance of data format Formation R - https://thinkr.fr 331 / 470


The one-million dollar question

​Of the importance of data format Formation R - https://thinkr.fr 332 / 470


The one-million dollar question

#> # A tibble: 10,002 × 3 #> # A tibble: 3,917 × 6


#> NOIND food_type amount #> NOIND Milk Drinks Bread Juice Croissants
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 Milk 250. #> 1 110100101 250. 0 0 0 0
#> 2 110100701 Drinks 1750. #> 2 110100701 0 1750. 72 0 0
#> 3 110100701 Bread 72 #> 3 110100801 458. 0 103. 0 0
#> 4 110100801 Milk 458. #> 4 110101201 8 654. 0 255. 55
#> 5 110100801 Bread 103. #> 5 110101401 0 1342. 127. 0 0
#> 6 110101201 Drinks 654. #> 6 110300301 15.6 1217. 0 0 0
#> 7 110101201 Juice 255. #> 7 110300501 77.8 800 0 0 0
#> 8 110101201 Milk 8 #> 8 110600101 93.0 1283. 177. 0 0
#> 9 110101201 Croissants 55 #> 9 110601301 46.9 587. 16 0 0
#> 10 110101401 Drinks 1342. #> 10 110602001 0 1150. 220. 125. 0
#> # … with 9,992 more rows #> # … with 3,907 more rows

​Of the importance of data format Formation R - https://thinkr.fr 333 / 470


The one-million dollar question

#> # A tibble: 10,002 × 3


#> NOIND food_type amount
#> <dbl> <chr> <dbl>
#> 1 110100101 Milk 250.
#> 2 110100701 Drinks 1750.
#> 3 110100701 Bread 72
#> 4 110100801 Milk 458.
#> 5 110100801 Bread 103.
#> 6 110101201 Drinks 654.
#> 7 110101201 Juice 255.
#> 8 110101201 Milk 8
#> 9 110101201 Croissants 55
#> 10 110101401 Drinks 1342.
#> # … with 9,992 more rows

​Of the importance of data format Formation R - https://thinkr.fr 334 / 470


The one-million dollar question

#> # A tibble: 3,917 × 6


#> NOIND Milk Drinks Bread Juice Croissants
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 250. 0 0 0 0
#> 2 110100701 0 1750. 72 0 0
#> 3 110100801 458. 0 103. 0 0
#> 4 110101201 8 654. 0 255. 55
#> 5 110101401 0 1342. 127. 0 0
#> 6 110300301 15.6 1217. 0 0 0
#> 7 110300501 77.8 800 0 0 0
#> 8 110600101 93.0 1283. 177. 0 0
#> 9 110601301 46.9 587. 16 0 0
#> 10 110602001 0 1150. 220. 125. 0
#> # … with 3,907 more rows

​Of the importance of data format Formation R - https://thinkr.fr 335 / 470


The one-million dollar question

​Of the importance of data format Formation R - https://thinkr.fr 336 / 470



Widen a dataset with pivot_wider()

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 338 / 470


Widen a dataset with pivot_wider()

food_intake <- tibble(


id = c(110100101, 110100101, 110100101, 110100701, 110100801, 110100801),
food_type = c("Water", "Juice", "Milk", "Water", "Water", "Milk"),
amount = c(1632.8, 1420.7, 250.9, 3082.5, 1500.0, 458.1)
)

food_intake

#> # A tibble: 6 × 3
#> id food_type amount
#> <dbl> <chr> <dbl>
#> 1 110100101 Water 1633.
#> 2 110100101 Juice 1421.
#> 3 110100101 Milk 251.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 339 / 470


Widen a dataset with pivot_wider()

pivot_wider()

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 340 / 470


Widen a dataset with pivot_wider()
food_intake %>%
pivot_wider(
# name of the column to widen
names_from = food_type,
# name of the column that contains the values
values_from = amount
)

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 341 / 470


Widen a dataset with pivot_wider()

# food_intake %>%
# pivot_wider(
# names_from = food_type,
# values_from = amount
food_intake )

#> # A tibble: 6 × 3 #> # A tibble: 3 × 4


#> id food_type amount #> id Water Juice Milk
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 Water 1633. #> 1 110100101 1633. 1421. 251.
#> 2 110100101 Juice 1421. #> 2 110100701 3082. NA NA
#> 3 110100101 Milk 251. #> 3 110100801 1500 NA 458.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 342 / 470


Widen a dataset with pivot_wider()

values_fill

values_fill values_fill

# food_intake %>%
# pivot_wider(
# names_from = food_type,
food_intake %>% values_from = amount,
pivot_wider( values_fill = list(
names_from = food_type, amount = 0
values_from = amount )
) )

#> # A tibble: 3 × 4 #> # A tibble: 3 × 4


#> id Water Juice Milk #> id Water Juice Milk
#> <dbl> <dbl> <dbl> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 1633. 1421. 251. #> 1 110100101 1633. 1421. 251.
#> 2 110100701 3082. NA NA #> 2 110100701 3082. 0 0
#> 3 110100801 1500 NA 458. #> 3 110100801 1500 0 458.

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 343 / 470


Widen a dataset with pivot_wider()
#> # A tibble: 6 × 3
#> id food_type amount
#> <dbl> <chr> <dbl>
#> 1 110100101 Water 1633.
#> 2 110100101 Juice 1421.
#> 3 110100101 Milk 251.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.

food_intake %>%
pivot_wider(
names_from = contains("groupe"),
values_from = contains("quantite")
)

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 344 / 470


Quiz
pivot_wider()

#> # A tibble: 1,841 × 3 #> # A tibble: 1,529 × 4


#> id product_type amount #> id vitamins blend plants
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110300601 vitamins 3 #> 1 110300601 3 NA NA
#> 2 110600401 vitamins 3 #> 2 110600401 3 NA NA
#> 3 110601301 vitamins 4 #> 3 110601301 4 NA NA
#> 4 110601801 vitamins 21 #> 4 110601801 21 NA NA
#> 5 110604501 blend 1 #> 5 110604501 NA 1 NA
#> 6 110604901 plants 4 #> 6 110604901 1 NA 4
#> # … with 1,835 more rows #> # … with 1,523 more rows

names_from = amount, values_from = product_type

names_from = product_type, values_from = amount

names_from = product_type, values_from = amount, values_fill = list(amount = 0)

​Widen a dataset with {tidyr} Formation R - https://thinkr.fr 345 / 470



Lengthen a dataset with pivot_longer()

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 347 / 470


Lengthen a dataset with pivot_longer()

intake_vitamins <- tibble(


id = c(110100101, 110100701, 110100801, 110101201, 110101401, 110300301),
gender = c("Man", "Woman", "Man", "Man", "Woman", "Man"),
vitamin_c = c(506.2, 457.7, 192.9, 603.8, 161.3, 91.7),
vitamin_d = c(9.7, 19.6, 10.2, 20.5, 12.8, 10.0)
)

intake_vitamins

#> # A tibble: 6 × 4
#> id gender vitamin_c vitamin_d
#> <dbl> <chr> <dbl> <dbl>
#> 1 110100101 Man 506. 9.7
#> 2 110100701 Woman 458. 19.6
#> 3 110100801 Man 193. 10.2
#> 4 110101201 Man 604. 20.5
#> 5 110101401 Woman 161. 12.8
#> 6 110300301 Man 91.7 10

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 348 / 470


Lengthen a dataset with pivot_longer()

pivot_longer()

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 349 / 470


Lengthen a dataset with pivot_longer()
intake_vitamins %>%
pivot_longer(
# columns to gather
cols = c(vitamin_c, vitamin_d),
# name of the column that's going to contain the former columns names
names_to = "vitamin",
# name of the column that's going to contain the former columns values
values_to = "amount"
)

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 350 / 470


Lengthen a dataset with pivot_longer()

# intake_vitamins %>%
# pivot_longer(
# cols = c(vitamin_c, vitamin_d),
# names_to = "vitamin",
# values_to = "amount"
intake_vitamins )

#> # A tibble: 6 × 4 #> # A tibble: 12 × 4


#> id gender vitamin_c vitamin_d #> id gender vitamin amount
#> <dbl> <chr> <dbl> <dbl> #> <dbl> <chr> <chr> <dbl>
#> 1 110100101 Man 506. 9.7 #> 1 110100101 Man vitamin_c 506.
#> 2 110100701 Woman 458. 19.6 #> 2 110100101 Man vitamin_d 9.7
#> 3 110100801 Man 193. 10.2 #> 3 110100701 Woman vitamin_c 458.
#> 4 110101201 Man 604. 20.5 #> 4 110100701 Woman vitamin_d 19.6
#> 5 110101401 Woman 161. 12.8 #> 5 110100801 Man vitamin_c 193.
#> 6 110300301 Man 91.7 10 #> 6 110100801 Man vitamin_d 10.2
#> 7 110101201 Man vitamin_c 604.
#> 8 110101201 Man vitamin_d 20.5
#> 9 110101401 Woman vitamin_c 161.
#> 10 110101401 Woman vitamin_d 12.8
#> 11 110300301 Man vitamin_c 91.7
#> 12 110300301 Man vitamin_d 10

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 351 / 470


Lengthen a dataset with pivot_longer()
#> # A tibble: 6 × 4
#> id gender vitamin_c vitamin_d
#> <dbl> <chr> <dbl> <dbl>
#> 1 110100101 Man 506. 9.7
#> 2 110100701 Woman 458. 19.6
#> 3 110100801 Man 193. 10.2
#> 4 110101201 Man 604. 20.5
#> 5 110101401 Woman 161. 12.8
#> 6 110300301 Man 91.7 10

intake_vitamins %>% intake_vitamins %>%


pivot_longer( pivot_longer(
cols = starts_with("vitamin"), cols = -c(id, gender),
names_to = "vitamin", names_to = "vitamin",
values_to = "amount" values_to = "amount"
) )

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 352 / 470


Quiz
pivot_longer()

#> # A tibble: 4,725 × 4 #> # A tibble: 14,175 × 3


#> id video_game computer tv #> id screen_type time
#> <dbl> <dbl> <dbl> <dbl> #> <dbl> <chr> <dbl>
#> 1 120100401 0 0 1.64 #> 1 120100401 video_game 0
#> 2 120100501 0.964 0.571 0.821 #> 2 120100401 computer 0
#> 3 120100601 0 0.571 1.14 #> 3 120100401 tv 1.64
#> 4 120100801 0.357 0.357 1 #> 4 120100501 video_game 0.964
#> 5 120100901 1 0.286 2.57 #> 5 120100501 computer 0.571
#> 6 120101001 0.214 1.21 1.43 #> 6 120100501 tv 0.821
#> 7 120101201 0 0.786 1.14 #> 7 120100601 video_game 0
#> 8 120101301 1.5 0.25 1.5 #> 8 120100601 computer 0.571
#> 9 120200401 0 2.73 1 #> 9 120100601 tv 1.14
#> 10 120300101 0 5.57 4 #> 10 120100801 video_game 0.357
#> # … with 4,715 more rows #> # … with 14,165 more rows

cols = c(video_game, computer, tv), names_to = "time", values_to = "screen_type"

cols = id, names_to = "screen_type", values_to = "time"

cols = -id, names_to = "screen_type", values_to = "time"

​Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 353 / 470



Work on several columns
data_food

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 26 Aperitif before lunch Home water tap water 148.
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.8

age
occasion_type
occasion_location
food_type
food
amount

Formation R - https://thinkr.fr 355 / 470


mutate() variants

mutate_all()
mutate_at()
mutate_if()

Formation R - https://thinkr.fr 356 / 470


mutate() variants

mutate_if()

your_dataframe %>%
mutate_if(
condition,
function_to_apply
)

Formation R - https://thinkr.fr 357 / 470


mutate() variants

mutate_if()

data_food %>%
mutate_if(
is.numeric,
as.character
)

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 26 Aperitif before lunch Home water tap water 147.5
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.84

age amount numeric


character

Formation R - https://thinkr.fr 358 / 470


mutate() variants

mutate_if()

data_food %>%
mutate_if(
is.numeric,
list("chr" = as.character)
)

#> # A tibble: 6 × 8
#> age occasion_type occasion_locati… food_type food amount age_chr amount_chr
#> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 26 Aperitif bef… Home water tap … 148. 26 147.5
#> 2 43 Lunch Home vegetabl… frui… 500 43 500
#> 3 56 Lunch Home animal f… n.s.… 3.22 56 3.22
#> 4 49 Lunch Home potatoes… pota… 107 49 107
#> 5 53 Lunch Home Meet bas… chic… 161 53 161
#> 6 36 Lunch Home plant-ba… n.s.… 12.8 36 12.84

Formation R - https://thinkr.fr 359 / 470


mutate() variants

mutate_at()

your_dataframe %>%
mutate_at(
variables_to_transformed,
functions_to_apply
)

Formation R - https://thinkr.fr 360 / 470


mutate() variants

mutate_at()

data_food %>%
mutate_at(
c("occasion_type", "amount"),
as.factor
)

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <fct> <chr> <chr> <chr> <fct>
#> 1 26 Aperitif before lunch Home water tap water 147.5
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.84

Formation R - https://thinkr.fr 361 / 470


Specify the name of the variables to
transform
" "

data_food %>%
mutate_at(
c("occasion_type", "amount"),
as.factor
)

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <fct> <chr> <chr> <chr> <fct>
#> 1 26 Aperitif before lunch Home water tap water 147.5
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.84

Formation R - https://thinkr.fr 362 / 470


Specify the name of the variables to
transform
vars()

data_food %>%
mutate_at(
vars(occasion_type, amount),
as.factor
)

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <fct> <chr> <chr> <chr> <fct>
#> 1 26 Aperitif before lunch Home water tap water 147.5
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.84

Formation R - https://thinkr.fr 363 / 470


Specify the name of the variables to
transform
vars()

data_food %>%
mutate_at(
vars(ends_with("food")),
as.factor
)

#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <chr> <chr> <chr> <fct> <dbl>
#> 1 26 Aperitif before lunch Home water tap water 148.
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.8

Formation R - https://thinkr.fr 364 / 470


summarise() variants

summarise_all()
summarise_at()
summarise_if()

Formation R - https://thinkr.fr 365 / 470


summarise() variants

summarise_at()

data_food %>%
summarise_at(
vars(age, amount),
mean
)

#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 NA

amount NA

na.rm = TRUE

Formation R - https://thinkr.fr 366 / 470


Specify an adhoc function
data_food %>%
summarise_at(
vars(age, amount),
~ mean(.x, na.rm = TRUE)
)

#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 118.

~ function(.x)

Formation R - https://thinkr.fr 367 / 470


summarise() variants

summarise_at()

data_food %>%
summarise_at(
vars(age, amount),
list(
"var" = ~ var(.x, na.rm = TRUE),
"median" = ~ median(.x, na.rm = TRUE)
)
)

#> # A tibble: 1 × 4
#> age_var amount_var age_median amount_median
#> <dbl> <dbl> <dbl> <dbl>
#> 1 169. 17515. 37 79.3

Formation R - https://thinkr.fr 368 / 470


summarise() variants

summarise_if()

data_food %>%
summarise_if(
is.numeric,
~ mean(.x, na.rm = TRUE)
)

#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 118.

Formation R - https://thinkr.fr 369 / 470


summarise() variants

summarise_if()

data_food %>%
summarise_if(
is.numeric,
list(
"mean" = ~ mean(.x, na.rm = TRUE),
"var" = ~ var(.x, na.rm = TRUE),
"max" = ~ max(.x, na.rm = TRUE))
)

#> # A tibble: 1 × 6
#> age_mean amount_mean age_var amount_var age_max amount_max
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 37.5 118. 169. 17515. 60 1768.

Formation R - https://thinkr.fr 370 / 470


Example
data_food %>%
mutate(
age_class = cut(age, breaks = 3, labels = c("jeune", "adulte", "senior") )
) %>%
group_by(age_class) %>%
summarise_if(
is.numeric,
list("min" = ~ min(.x, na.rm = TRUE),
"max" = ~ max(.x, na.rm = TRUE),
"mean" = ~ mean(.x, na.rm = TRUE)
)
)

Formation R - https://thinkr.fr 371 / 470


To go further

select_if()
select_at()
groups()
group_by_all()
group_by_at()
group_by_if()
group_split()
group_nest()

Formation R - https://thinkr.fr 372 / 470



On the origin of joins

#> # A tibble: 5,855 × 4 #> # A tibble: 4,725 × 2


#> NOMEN NOIND age gender #> NOIND activity_profile
#> <dbl> <dbl> <chr> <chr> #> <dbl> <chr>
#> 1 1101001 110100101 18-44 years Male #> 1 120100401 inactive and not sedentary
#> 2 1101007 110100701 45-64 years Female #> 2 120100501 inactive and not sedentary
#> 3 1101008 110100801 45-64 years Male #> 3 120100601 inactive and not sedentary
#> 4 1101012 110101201 45-64 years Male #> 4 120100801 inactive and not sedentary
#> 5 1101014 110101401 65-79 years Female #> 5 120100901 inactive and sedentary
#> 6 1101016 110101601 45-64 years Female #> 6 120101001 inactive and not sedentary
#> 7 1101019 110101901 18-44 years Male #> 7 120101201 inactive and not sedentary
#> 8 1102001 110200101 45-64 years Male #> 8 120101301 inactive and sedentary
#> 9 1103003 110300301 45-64 years Male #> 9 120200401 inactive and sedentary
#> 10 1103005 110300501 65-79 years Male #> 10 120300101 active and sedentary
#> # … with 5,845 more rows #> # … with 4,715 more rows

​What's the join? Formation R - https://thinkr.fr 374 / 470


On the origin of joins

#> # A tibble: 4,725 × 2


#> age activity_profile
#> <chr> <chr>
#> 1 18-44 years inactive and sedentary
#> 2 45-64 years active and sedentary
#> 3 45-64 years active and not sedentary
#> 4 45-64 years active and sedentary
#> 5 65-79 years active and not sedentary
#> 6 45-64 years <NA>
#> 7 18-44 years active and sedentary
#> 8 45-64 years active and sedentary
#> 9 65-79 years active and not sedentary
#> 10 65-79 years active and sedentary
#> # … with 4,715 more rows

​What's the join? Formation R - https://thinkr.fr 375 / 470


How a join works

item

​What's the join? Formation R - https://thinkr.fr 376 / 470


Quiz

data_age data_phys_act

#> # A tibble: 5 × 4 #> # A tibble: 5 × 2


#> NOMEN NOIND tage_PS age #> NOIND activity_profile
#> <dbl> <chr> <chr> <chr> #> <chr> <chr>
#> 1 1131047 064 45-64 ans 45-64 years #> 1 059 inactive and not sedentary
#> 2 1132001 059 45-64 ans 45-64 years #> 2 098 inactive and sedentary
#> 3 1132006 035 18-44 ans 18-44 years #> 3 022 inactive and sedentary
#> 4 1132012 049 18-44 ans 18-44 years #> 4 049 active and sedentary
#> 5 1132014 022 65-79 ans 65-79 years #> 5 054 active and not sedentary

age

activity_profile

NOMEN

NOIND

​What's the join? Formation R - https://thinkr.fr 377 / 470



The different ways to combine two tables

item

​The different kinds of join Formation R - https://thinkr.fr 379 / 470


INNER JOIN

​The different kinds of join Formation R - https://thinkr.fr 380 / 470


INNER JOIN

​The different kinds of join Formation R - https://thinkr.fr 381 / 470


LEFT JOIN

​The different kinds of join Formation R - https://thinkr.fr 382 / 470


LEFT JOIN

​The different kinds of join Formation R - https://thinkr.fr 383 / 470


FULL JOIN

​The different kinds of join Formation R - https://thinkr.fr 384 / 470


FULL JOIN

​The different kinds of join Formation R - https://thinkr.fr 385 / 470


ANTI JOIN

​The different kinds of join Formation R - https://thinkr.fr 386 / 470


ANTI JOIN

​The different kinds of join Formation R - https://thinkr.fr 387 / 470


Quiz

​The different kinds of join Formation R - https://thinkr.fr 388 / 470



Combine tables from the INCA3 study
data_a <- tibble( data_b <- tibble(
NOIND = c("087", "049", "054", "078", "064", NOIND = c("087", "078", "016", "013", "029",
"016"), "044"),
gender = c("Male", "Female", "Male", "Male", reads_nutri_label = c("Never", "Never",
"Female", "Female") "Never", "Never", "Sometimes", "Always")
) )

data_a data_b

#> # A tibble: 6 × 2 #> # A tibble: 6 × 2


#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always

NOIND

​Perform a join with {dplyr} Formation R - https://thinkr.fr 390 / 470


Specify the join key
{dplyr} by

by

by

​Perform a join with {dplyr} Formation R - https://thinkr.fr 391 / 470


INNER JOIN - inner_join()
data_a data_b

#> # A tibble: 6 × 2 #> # A tibble: 6 × 2


#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always

data_a %>%
inner_join(data_b, by = "NOIND")

#> # A tibble: 3 × 3
#> NOIND gender reads_nutri_label
#> <chr> <chr> <chr>
#> 1 087 Male Never
#> 2 078 Male Never
#> 3 016 Female Never

​Perform a join with {dplyr} Formation R - https://thinkr.fr 392 / 470


LEFT JOIN - left_join()
data_a data_b

#> # A tibble: 6 × 2 #> # A tibble: 6 × 2


#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always

data_a %>%
left_join(data_b, by = "NOIND")

#> # A tibble: 6 × 3
#> NOIND gender reads_nutri_label
#> <chr> <chr> <chr>
#> 1 087 Male Never
#> 2 049 Female <NA>
#> 3 054 Male <NA>
#> 4 078 Male Never
#> 5 064 Female <NA>
#> 6 016 Female Never

​Perform a join with {dplyr} Formation R - https://thinkr.fr 393 / 470


FULL JOIN - full_join()
data_a data_b

#> # A tibble: 6 × 2 #> # A tibble: 6 × 2


#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always

data_a %>%
full_join(data_b, by = "NOIND")

#> # A tibble: 9 × 3
#> NOIND gender reads_nutri_label
#> <chr> <chr> <chr>
#> 1 087 Male Never
#> 2 049 Female <NA>
#> 3 054 Male <NA>
#> 4 078 Male Never
#> 5 064 Female <NA>
#> 6 016 Female Never
#> 7 013 <NA> Never
#> 8 029 <NA> Sometimes
#> 9 044 <NA> Always

​Perform a join with {dplyr} Formation R - https://thinkr.fr 394 / 470


ANTI JOIN - anti_join()
data_a data_b

#> # A tibble: 6 × 2 #> # A tibble: 6 × 2


#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always

data_a %>%
anti_join(data_b, by = "NOIND")

#> # A tibble: 3 × 2
#> NOIND gender
#> <chr> <chr>
#> 1 049 Female
#> 2 054 Male
#> 3 064 Female

​Perform a join with {dplyr} Formation R - https://thinkr.fr 395 / 470


Quiz
data_age_act_phys

data_age data_phys_act

#> # A tibble: 5 × 2 #> # A tibble: 5 × 2


#> NOIND age #> NOIND activity_profile
#> <chr> <chr> #> <chr> <chr>
#> 1 064 45-64 years #> 1 059 inactive and not sedentary
#> 2 059 45-64 years #> 2 098 inactive and sedentary
#> 3 035 18-44 years #> 3 022 inactive and sedentary
#> 4 049 18-44 years #> 4 049 active and sedentary
#> 5 022 65-79 years #> 5 054 active and not sedentary

data_age_act_phys

#> # A tibble: 5 × 3 data_age %>% full_join(data_phys_act, by = "NOIND")


#> NOIND activity_profile age
#> <chr> <chr> <chr>
#> 1 059 inactive and not sedentary 45-64 years data_age %>% left_join(data_phys_act, by = "NOIND")
#> 2 098 inactive and sedentary <NA>
#> 3 022 inactive and sedentary 65-79 years
#> 4 049 active and sedentary 18-44 years data_phys_act %>% left_join(data_age, by = "age")
#> 5 054 active and not sedentary <NA>

data_phys_act %>% left_join(data_age, by = "NOIND")

​Perform a join with {dplyr} Formation R - https://thinkr.fr 396 / 470



These datasets are not in "tidy" format

#> # A tibble: 10 × 4
#> individual detail weight height
#> <int> <chr> <int> <int>
#> 1 1 60-M 96 166
#> 2 2 42-M 96 157
#> 3 3 32-I 96 161
#> 4 4 26-M 90 157
#> 5 5 56-F 86 170
#> 6 1 59-I 95 166
#> 7 2 38-M 85 171
#> 8 3 48-F 97 180
#> 9 4 24-M 88 155
#> 10 5 31-M 85 161

​Clean your data with {tidyr} Formation R - https://thinkr.fr 398 / 470


These datasets are not in "tidy" format

#> # A tibble: 10 × 5
#> indiv year month day obs
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 2019 01 01 j
#> 2 2 2019 01 01 u
#> 3 3 2019 01 01 q
#> 4 4 2019 01 01 k
#> 5 5 2019 01 01 g
#> 6 1 2019 02 01 n
#> 7 2 2019 02 01 h
#> 8 3 2019 02 01 w
#> 9 4 2019 02 01 i
#> 10 5 2019 02 01 p

​Clean your data with {tidyr} Formation R - https://thinkr.fr 399 / 470


{tidyr}

{tidyr}

separate()

unite()

​Clean your data with {tidyr} Formation R - https://thinkr.fr 400 / 470


separate()

col into

data %>%
separate(
col = colonne_a,
into = c("a", "b"),
sep = "-" # make separator explicit
)

​Clean your data with {tidyr} Formation R - https://thinkr.fr 401 / 470


separate()

remove = FALSE

data %>%
separate(
col = column_ab,
into = c("a", "b"),
sep = "-",
remove = FALSE
)

​Clean your data with {tidyr} Formation R - https://thinkr.fr 402 / 470


separate()

# data_indiv %>%
# separate(
# col = detail,
# sep = "-",
data_indiv into = c("age", "gender")
) %>%
#> # A tibble: 10 × 4
mutate(age = as.numeric(age))
#> individual detail weight height
#> <int> <chr> <int> <int> #> # A tibble: 10 × 5
#> 1 1 60-M 96 166 #> individual age gender weight height
#> 2 2 42-M 96 157 #> <int> <dbl> <chr> <int> <int>
#> 3 3 32-I 96 161 #> 1 1 60 M 96 166
#> 4 4 26-M 90 157 #> 2 2 42 M 96 157
#> 5 5 56-F 86 170 #> 3 3 32 I 96 161
#> 6 1 59-I 95 166 #> 4 4 26 M 90 157
#> 7 2 38-M 85 171 #> 5 5 56 F 86 170
#> 8 3 48-F 97 180 #> 6 1 59 I 95 166
#> 9 4 24-M 88 155 #> 7 2 38 M 85 171
#> 10 5 31-M 85 161 #> 8 3 48 F 97 180
#> 9 4 24 M 88 155
#> 10 5 31 M 85 161

separate()

​Clean your data with {tidyr} Formation R - https://thinkr.fr 403 / 470


unite()

col

data %>%
unite(
col = "colonne_ab",
colonne_a, colonne_b,
sep = "/"
)

​Clean your data with {tidyr} Formation R - https://thinkr.fr 404 / 470


unite()

remove = FALSE

data %>%
unite(
col = "colonne_ab",
colonne_a, colonne_b,
sep = "/",
remove = FALSE
)

​Clean your data with {tidyr} Formation R - https://thinkr.fr 405 / 470


unite()

# data_obs %>%
# unite(
# col = "date",
# year, month, day,
# sep = "/"
data_obs )

#> # A tibble: 10 × 5 #> # A tibble: 10 × 3


#> indiv year month day obs #> indiv date obs
#> <int> <chr> <chr> <chr> <chr> #> <int> <chr> <chr>
#> 1 1 2019 01 01 j #> 1 1 2019/01/01 j
#> 2 2 2019 01 01 u #> 2 2 2019/01/01 u
#> 3 3 2019 01 01 q #> 3 3 2019/01/01 q
#> 4 4 2019 01 01 k #> 4 4 2019/01/01 k
#> 5 5 2019 01 01 g #> 5 5 2019/01/01 g
#> 6 1 2019 02 01 n #> 6 1 2019/02/01 n
#> 7 2 2019 02 01 h #> 7 2 2019/02/01 h
#> 8 3 2019 02 01 w #> 8 3 2019/02/01 w
#> 9 4 2019 02 01 i #> 9 4 2019/02/01 i
#> 10 5 2019 02 01 p #> 10 5 2019/02/01 p

​Clean your data with {tidyr} Formation R - https://thinkr.fr 406 / 470


Quiz

#> # A tibble: 8 × 4
#> id height_weight unite_weight unite_height
#> <int> <chr> <chr> <chr>
#> 1 1 187_83 kg cm
#> 2 2 166_69 kg cm
#> 3 3 175_86 kg cm
#> 4 4 164_70 kg cm
#> 5 5 183_81 kg cm
#> 6 6 177_88 kg cm
#> 7 7 160_68 kg cm
#> 8 8 179_79 kg cm

separate(col = height_weight, into = c("height", "weight"), sep = "-")


mutate(imc = as.numeric(weight) / ((as.numeric(height) / 100) ^ 2))

separate(col = height_weight, into = c("height", "weight"), sep = "_")


mutate(imc = as.numeric(weight) / ((as.numeric(height) / 100) ^ 2))

unite(col = "height_weight", height, weight, sep = "_")


mutate(imc = as.numeric(weight) / ((as.numeric(height) / 100) ^ 2))

​Clean your data with {tidyr} Formation R - https://thinkr.fr 407 / 470



Some observations are missing

NA

#> # A tibble: 8 × 4
#> year individual weight height
#> <chr> <int> <int> <int>
#> 1 2019 1 NA 180
#> 2 <NA> 2 96 187
#> 3 <NA> 3 95 184
#> 4 <NA> 4 89 189
#> 5 2020 1 85 182
#> 6 <NA> 2 85 176
#> 7 <NA> 3 86 180
#> 8 <NA> 4 NA 170

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 409 / 470


Some observations are missing

NA

#> # A tibble: 4 × 3
#> year month weight
#> <chr> <chr> <dbl>
#> 1 2017 01 86
#> 2 2018 02 95
#> 3 2019 01 90
#> 4 2019 02 92

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 410 / 470


Possible causes

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 411 / 470


Two strategies

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 412 / 470


{tidyr}

{tidyr}

fill()

drop_na()

replace_na()

complete()

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 413 / 470


fill()

data %>%
fill(column_a, column_b)

data %>%
fill(column_a, column_b, .direction = "up")

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 414 / 470


fill()

data_indiv %>%
# original dataset fill(year)
data_indiv
#> # A tibble: 8 × 4
#> # A tibble: 8 × 4 #> year individual weight height
#> year individual weight height #> <chr> <int> <int> <int>
#> <chr> <int> <int> <int> #> 1 2019 1 NA 180
#> 1 2019 1 NA 180 #> 2 2019 2 96 187
#> 2 <NA> 2 96 187 #> 3 2019 3 95 184
#> 3 <NA> 3 95 184 #> 4 2019 4 89 189
#> 4 <NA> 4 89 189 #> 5 2020 1 85 182
#> 5 2020 1 85 182 #> 6 2020 2 85 176
#> 6 <NA> 2 85 176 #> 7 2020 3 86 180
#> 7 <NA> 3 86 180 #> 8 2020 4 NA 170
#> 8 <NA> 4 NA 170

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 415 / 470


fill()

data_indiv %>%
# original dataset fill(year, .direction = "up")
data_indiv
#> # A tibble: 8 × 4
#> # A tibble: 8 × 4 #> year individual weight height
#> year individual weight height #> <chr> <int> <int> <int>
#> <chr> <int> <int> <int> #> 1 2019 1 NA 180
#> 1 2019 1 NA 180 #> 2 2020 2 96 187
#> 2 <NA> 2 96 187 #> 3 2020 3 95 184
#> 3 <NA> 3 95 184 #> 4 2020 4 89 189
#> 4 <NA> 4 89 189 #> 5 2020 1 85 182
#> 5 2020 1 85 182 #> 6 <NA> 2 85 176
#> 6 <NA> 2 85 176 #> 7 <NA> 3 86 180
#> 7 <NA> 3 86 180 #> 8 <NA> 4 NA 170
#> 8 <NA> 4 NA 170

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 416 / 470


drop_na()

data %>%
drop_na()

data %>%
drop_na(column_a, column_b)

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 417 / 470


drop_na()

# original dataset data_indiv %>%


data_indiv drop_na(year)

#> # A tibble: 8 × 4 #> # A tibble: 2 × 4


#> year individual weight height #> year individual weight height
#> <chr> <int> <int> <int> #> <chr> <int> <int> <int>
#> 1 2019 1 NA 180 #> 1 2019 1 NA 180
#> 2 <NA> 2 96 187 #> 2 2020 1 85 182
#> 3 <NA> 3 95 184
#> 4 <NA> 4 89 189
#> 5 2020 1 85 182
#> 6 <NA> 2 85 176
#> 7 <NA> 3 86 180
#> 8 <NA> 4 NA 170

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 418 / 470


replace_na()

column = "value" NA column "value"

data %>%
replace_na(
replace = list(
column_a = "value_a",
column_b = "value_b"
)
)

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 419 / 470


replace_na()

# data_indiv %>%
# replace_na(
# replace = list(year = "00")
data_indiv )

#> # A tibble: 8 × 4 #> # A tibble: 8 × 4


#> year individual weight height #> year individual weight height
#> <chr> <int> <int> <int> #> <chr> <int> <int> <int>
#> 1 2019 1 NA 180 #> 1 2019 1 NA 180
#> 2 <NA> 2 96 187 #> 2 00 2 96 187
#> 3 <NA> 3 95 184 #> 3 00 3 95 184
#> 4 <NA> 4 89 189 #> 4 00 4 89 189
#> 5 2020 1 85 182 #> 5 2020 1 85 182
#> 6 <NA> 2 85 176 #> 6 00 2 85 176
#> 7 <NA> 3 86 180 #> 7 00 3 86 180
#> 8 <NA> 4 NA 170 #> 8 00 4 NA 170

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 420 / 470


complete()

NA

data %>%
complete(column_a, column_b)

column_a column_b
NA

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 421 / 470


complete()

# original dataset data_weight %>%


data_weight complete(year, month)

#> # A tibble: 4 × 3 #> # A tibble: 6 × 3


#> year month weight #> year month weight
#> <chr> <chr> <dbl> #> <chr> <chr> <dbl>
#> 1 2017 01 86 #> 1 2017 01 86
#> 2 2018 02 95 #> 2 2017 02 NA
#> 3 2019 01 90 #> 3 2018 01 NA
#> 4 2019 02 92 #> 4 2018 02 95
#> 5 2019 01 90
#> 6 2019 02 92

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 422 / 470


Noteworthy
drop_na() fill() tidyselect

data %>%
fill(
contains("encoded")
)

data %>%
drop_na(
starts_with("comment")
)

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 423 / 470


Quiz

data

data %>% filter_na()

data %>% fill()

data %>% complete()

data %>% drop_na()

​Handle missing data with {tidyr} Formation R - https://thinkr.fr 424 / 470



Manipulate dates and times with
{lubridate}
{lubridate}
library(tidyverse)

library(lubridate)

Formation R - https://thinkr.fr 426 / 470


About the ISO 8601 format

Formation R - https://thinkr.fr 427 / 470


Now
now() today()

now()

#> [1] "2022-01-13 11:27:15 UTC"

today()

#> [1] "2022-01-13"

today() Date now()


POSIXt

today() %>% class()

#> [1] "Date"

now() %>% class()

#> [1] "POSIXct" "POSIXt"

Formation R - https://thinkr.fr 428 / 470


Quiz

Formation R - https://thinkr.fr 429 / 470


Import dates

Formation R - https://thinkr.fr 430 / 470


Transformation from string to Date
the_dates

the_dates <- c("01-01-09", "010209", "01-03-09","01-01-2009", "01-02-2009",


"01/03/2009")

dmy() ydm() mdy()

mdy()
month - day - year

dmy(the_dates)

#> [1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-01-01" "2009-02-01"


#> [6] "2009-03-01"

mdy(the_dates)

#> [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-01" "2009-01-02"


#> [6] "2009-01-03"

Formation R - https://thinkr.fr 431 / 470


Quiz
"14-02-1986"

Formation R - https://thinkr.fr 432 / 470


Transformation from string to Date-hour
(POSIXt)
ymd_hms()
dmy_hm()

Formation R - https://thinkr.fr 433 / 470


Questions

"1986/02/15 20h05"

ymd_hm("1986/02/15 20h05")

#> [1] "1986-02-15 20:05:00 UTC"

"the 11th of november 1918 at 11:00AM"

dmy_hm("the 11th of november 1918 at 11:00AM")

#> [1] "1918-11-11 11:00:00 UTC"

tz= OlsonNames()

dmy_hm("the 11th of november 1918 at 11:00AM", tz = "Europe/Paris")

#> [1] "1918-11-11 11:00:00 WET"

Formation R - https://thinkr.fr 434 / 470


Questions

"le 11 novembre 1918 à 11 heures 00"

dmy_hm("le 11 novembre 1918 a 11 heures 00")

#> Warning: All formats failed to parse. No formats found.

#> [1] NA

Formation R - https://thinkr.fr 435 / 470


About the "locale"

dmy_hm("le 11 novembre 1918 a 11 heures 00")

Sys.getlocale("LC_TIME") # American English

#> [1] "en_US.UTF-8"

Sys.setlocale("LC_TIME", "fr_FR.UTF-8") # French

#> [1] "fr_FR.UTF-8"

dmy_hm("le 11 novembre 1918 a 11 heures 00")

#> [1] "1918-11-11 11:00:00 UTC"

Formation R - https://thinkr.fr 436 / 470


Extract information from a date -
{lubridate}

present_moment <- now()


present_moment

#> [1] "2022-01-13 11:27:15 UTC"

year(present_moment) hour(present_moment)

#> [1] 2022 #> [1] 11

month(present_moment) minute(present_moment)

#> [1] 1 #> [1] 27

day(present_moment) second(present_moment)

#> [1] 13 #> [1] 15.8971

wday(present_moment)

#> [1] 5

Formation R - https://thinkr.fr 437 / 470


Exercise

tribble(
~name, ~date_of_birth,
"Sébastien", "26 juillet 83",
"Diane", "1er janvier 1985",
"Vincent", "11/02/1986",
"Colin", "22111988",
"Margot", "17 septembre 1991",
"Cervan", "22-octobre-91"
) %>%
mutate(date_of_birth = .....(date_of_birth)
) %>%
filter(.....(date_of_birth) == 9)

Formation R - https://thinkr.fr 438 / 470


Exercise

ymd("1986/11/02") %>%
wday(label = TRUE, abbr = FALSE)

#> [1] Sunday


#> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Formation R - https://thinkr.fr 439 / 470


Notion of period

years(2) # a period of 2 years minutes(2) # a period of 2 minutes

#> [1] "2y 0m 0d 0H 0M 0S" #> [1] "2M 0S"

months(2) # a period of 2 months hours(2) # a period of 2 hours

#> [1] "2m 0d 0H 0M 0S" #> [1] "2H 0M 0S"

days(2) # a period of 2 days seconds(2) # a period of 2 seconds

#> [1] "2d 0H 0M 0S" #> [1] "2S"

years(2) + months(2) + hours(2)

#> [1] "2y 2m 0d 2H 0M 0S"

today() + days(3)

#> [1] "2022-01-16"

Formation R - https://thinkr.fr 440 / 470


Quiz

tomorrow(tomorrow())

today() + days(2)

today() + hours(48)

today() + day(2)

Formation R - https://thinkr.fr 441 / 470


Practice

Formation R - https://thinkr.fr 442 / 470



{stringr} package

Formation R - https://thinkr.fr 444 / 470


Let's work on an example

phonebook <- tibble(


id = 1:5,
firstname = c("Steven", "ERIN", "Marie-Louise", "Layla", "Mitchell"),
lastname = c("DIXON", "FLORES ", "guillaumin ", "BRYANT", " Berry"),
coord = c(
" 6804 Preston Rd (563)-300-2113", "flores@mail.com (617)-990-5931 ",
"9464 Preston Rd, Dallas, TX 75225", "8046 Forest Ln, Humble, TX 77338", "
(089).225.6911 berry@msn.com"
)
)

Formation R - https://thinkr.fr 445 / 470


Concatenate
str_c()

sep collapse

str_c("one", "two", sep = " ") str_c(c("one", "two"), collapse = " ")

#> [1] "one two" #> [1] "one two"

lastname_firstname

phonebook %>%
mutate(lastname_firstname = str_c(lastname, firstname, sep = "_")) %>%
select(-coord)

#> # A tibble: 5 × 4
#> id firstname lastname lastname_firstname
#> <int> <chr> <chr> <chr>
#> 1 1 Steven "DIXON" "DIXON_Steven"
#> 2 2 ERIN "FLORES " "FLORES _ERIN"
#> 3 3 Marie-Louise "guillaumin " "guillaumin _Marie-Louise"
#> 4 4 Layla "BRYANT" "BRYANT_Layla"
#> 5 5 Mitchell " Berry" " Berry_Mitchell"

Formation R - https://thinkr.fr 446 / 470


Prune (remove leading and trailing
spaces)
str_trim()

str_trim(" Hello ")

#> [1] "Hello"

phonebook %>%
mutate_all(str_trim) %>%
mutate(lastname_firstname = str_c(lastname, firstname, sep = "_")) %>%
select(-coord)

#> # A tibble: 5 × 4
#> id firstname lastname lastname_firstname
#> <chr> <chr> <chr> <chr>
#> 1 1 Steven DIXON DIXON_Steven
#> 2 2 ERIN FLORES FLORES_ERIN
#> 3 3 Marie-Louise guillaumin guillaumin_Marie-Louise
#> 4 4 Layla BRYANT BRYANT_Layla
#> 5 5 Mitchell Berry Berry_Mitchell

Formation R - https://thinkr.fr 447 / 470


Change case (upper/lower case)
str_to_upper() str_to_lower() str_to_title() str_to_sentence()

str_to_upper("hello fred") str_to_title("hello fred")

#> [1] "HELLO FRED" #> [1] "Hello Fred"

phonebook %>%
mutate_all(str_trim) %>%
mutate(
firstname = str_to_title(firstname),
lastname = str_to_upper(lastname),
lastname_firstname = str_c(lastname, firstname, sep = "_")
) %>%
select(-coord)

#> # A tibble: 5 × 4
#> id firstname lastname lastname_firstname
#> <chr> <chr> <chr> <chr>
#> 1 1 Steven DIXON DIXON_Steven
#> 2 2 Erin FLORES FLORES_Erin
#> 3 3 Marie-Louise GUILLAUMIN GUILLAUMIN_Marie-Louise
#> 4 4 Layla BRYANT BRYANT_Layla
#> 5 5 Mitchell BERRY BERRY_Mitchell

Formation R - https://thinkr.fr 448 / 470


Question

c(" HELLO", " everyone ") %>%


str_...() %>%
str_c(... = " ") %>%
str_to_...()

#> [1] "Hello everyone"

Formation R - https://thinkr.fr 449 / 470


Detect a pattern
str_detect()

c("William", "Carl", "Jean-Paul", "Paul") %>%


str_detect("Paul")

#> [1] FALSE FALSE TRUE TRUE

filter()

phonebook %>%
filter(str_detect(coord, "Dallas")) %>%
select(-firstname, -lastname)

#> # A tibble: 1 × 2
#> id coord
#> <int> <chr>
#> 1 3 9464 Preston Rd, Dallas, TX 75225

Formation R - https://thinkr.fr 450 / 470


Replace/delete
str_replace_all() str_remove_all()

c("William", "Carl", "Jean-Paul", "Paul") %>%


str_replace_all(pattern = "Paul", replacement = "Jack")

#> [1] "William" "Carl" "Jean-Jack" "Jack"

phonebook %>%
mutate(
coord_new =
str_replace_all(coord, pattern = "msn.com", replacement = "hotmail.com")
) %>%
select(starts_with("coord"))

#> # A tibble: 5 × 2
#> coord coord_new
#> <chr> <chr>
#> 1 " 6804 Preston Rd (563)-300-2113" " 6804 Preston Rd (563)-300-2113"
#> 2 "flores@mail.com (617)-990-5931 " "flores@mail.com (617)-990-5931 "
#> 3 "9464 Preston Rd, Dallas, TX 75225" "9464 Preston Rd, Dallas, TX 75225"
#> 4 "8046 Forest Ln, Humble, TX 77338" "8046 Forest Ln, Humble, TX 77338"
#> 5 "(089).225.6911 berry@msn.com" "(089).225.6911 berry@hotmail.com"

Formation R - https://thinkr.fr 451 / 470


Use regular expressions

Formation R - https://thinkr.fr 452 / 470


Use regular expressions

Formation R - https://thinkr.fr 453 / 470


Use regular expressions

Formation R - https://thinkr.fr 454 / 470


Use Regular expressions

$ [:digit:]
^ [:upper:]
. [:punct:]

"01.53.40.30.20" %>% str_remove_all(pattern = "[:punct:]")

#> [1] "0153403020"

phonebook %>%
filter(str_detect(firstname, "^M")) %>%
select(-coord)

#> # A tibble: 2 × 3
#> id firstname lastname
#> <int> <chr> <chr>
#> 1 3 Marie-Louise "guillaumin "
#> 2 5 Mitchell " Berry"

Formation R - https://thinkr.fr 455 / 470


Extract
str_extract()

# succession of letters at the end of the # succession of numbers


sentence c("93300 Aubervilliers", "Paris 75017")
"R is very powerful" %>% str_extract(" %>% str_extract("[:digit:]+")
[:alpha:]+$")
#> [1] "93300" "75017"
#> [1] "powerful"

phonebook %>%
mutate(email = coord %>% str_extract("[:alnum:]+@[:alnum:]+\\.[:alnum:]+")) %>%
select(id, email) %>%
filter(!is.na(email))

#> # A tibble: 2 × 2
#> id email
#> <int> <chr>
#> 1 2 flores@mail.com
#> 2 5 berry@msn.com

Formation R - https://thinkr.fr 456 / 470


Quizz

phonebook %>%
mutate(
... = str_extract(coord, "[:alnum:]+@[:alnum:]+\\.[:alnum:]+"),
... = str_extract(coord, "([:digit:]|[:punct:]){10,14}+"),
address = str_...(coord),
address = case_when(
is.na(email) & is.na(phone) ~ address,
is.na(email) ~ ...(address, fixed(phone)),
is.na(phone) ~ str_remove_all(address, ...),
TRUE ~ address %>%
str_remove_all(...) %>%
str_remove_all(fixed(phone))
),
telephone = str_remove_all(phone, ...)
) %>%
select(id, email, phone, address)

Formation R - https://thinkr.fr 457 / 470


Expected result

#> # A tibble: 5 × 4
#> id firstname lastname coord
#> <int> <chr> <chr> <chr>
#> 1 1 Steven "DIXON" " 6804 Preston Rd (563)-300-2113"
#> 2 2 ERIN "FLORES " "flores@mail.com (617)-990-5931 "
#> 3 3 Marie-Louise "guillaumin " "9464 Preston Rd, Dallas, TX 75225"
#> 4 4 Layla "BRYANT" "8046 Forest Ln, Humble, TX 77338"
#> 5 5 Mitchell " Berry" "(089).225.6911 berry@msn.com"

#> # A tibble: 5 × 4
#> id email phone address
#> <int> <chr> <chr> <chr>
#> 1 1 <NA> 5633002113 "6804 Preston Rd "
#> 2 2 flores@mail.com 6179905931 " "
#> 3 3 <NA> <NA> "9464 Preston Rd, Dallas, TX 75225"
#> 4 4 <NA> <NA> "8046 Forest Ln, Humble, TX 77338"
#> 5 5 berry@msn.com 0892256911 " "

Formation R - https://thinkr.fr 458 / 470


Practical

Formation R - https://thinkr.fr 459 / 470



Create sentences
glue()

i <- 1
glue::glue("the value of i is {i} it's little")

#> the value of i is 1 it's little

i <- 1
stringr::str_c("the value of i is ", i, " it's little")

#> [1] "the value of i is 1 it's little"

Formation R - https://thinkr.fr 461 / 470


Create sentences
glue()

x <- 1:4
y <- c("little", "not much", "not bad", "a lot")
glue::glue("the value of i is {x} it's {y}")

#> the value of i is 1 it's little


#> the value of i is 2 it's not much
#> the value of i is 3 it's not bad
#> the value of i is 4 it's a lot

Formation R - https://thinkr.fr 462 / 470


Create sentences
glue()

firstname <- "Teddy"


weight <- 131
height <- 2.04

glue::glue("the BMI of {firstname} is {BMI}",


BMI = round(weight / (height)**2, digits = 1)
)

#> the BMI of Teddy is 31.5

stringr::str_c("the BMI of ", firstname, " is ", round(weight / (height)**2, digits =


1))

#> [1] "the BMI of Teddy is 31.5"

Formation R - https://thinkr.fr 463 / 470


Combine in a data frame
people <- tibble::tribble(
~firstname, ~weight, ~height,
"Teddy", 131, 2.04,
"Tom", 0.1, 0.5,
"Carla", 75, 1.75
)

people %>%
mutate(bmi = round(weight / (height)**2, digits = 2),
text = glue::glue("the BMI of {firstname} is {bmi}")
)

#> # A tibble: 3 × 5
#> firstname weight height bmi text
#> <chr> <dbl> <dbl> <dbl> <glue>
#> 1 Teddy 131 2.04 31.5 the BMI of Teddy is 31.48
#> 2 Tom 0.1 0.5 0.4 the BMI of Tom is 0.4
#> 3 Carla 75 1.75 24.5 the BMI of Carla is 24.49

Formation R - https://thinkr.fr 464 / 470



Assess training quality

​So ? What did you think about it ? Formation R - https://thinkr.fr 466 / 470
Satisfaction

​So ? What did you think about it ? Formation R - https://thinkr.fr 467 / 470

Training Review

Ressources

​Training Review - Level 1 Formation R - https://thinkr.fr 469 / 470

You might also like