R Basics

Officially: Hello ! Formation R - https://thinkr.fr 3 / 470

A sentence per bullet-point:
Getting to know each other Formation R - https://thinkr.fr 5 / 470

Name/pseudonym/...
For the ones who cannot remember names Formation R - https://thinkr.fr 7 / 470

Oue Goal: Making You Independent
It is gonna be dense... Formation R - https://thinkr.fr 9 / 470

Internal rules

Internal rules - Essentials

We learn by making mistakes

Concentration & Breaks
Timing Formation R - https://thinkr.fr 14 / 470

Using Zoom
How to interact with visioconference tools ? Formation R - https://thinkr.fr 16 / 470

Using Zoom

Using Zoom

Using Whereby

Using Whereby

How to interact?

SOS

Take a tour!
What is Bakacode? Formation R - https://thinkr.fr 24 / 470

Connection

Home page

Home page

Home page - launch

Pratice - presentation



Pratice - search

Pratice - export

Top menu

Top menu

Top menu

Top menu

Top menu

Training Goal:
Traning content Formation R - https://thinkr.fr 40 / 470

What is R ?
Welcome to R Formation R - https://thinkr.fr 42 / 470

Main functionnalities

How R works?

Installing R

Installing Rstudio

Create a project
Understand and initialize a Rstudio project Formation R - https://thinkr.fr 48 / 470

Create a project

Load a project
.Rproj

Getting started with RStudio
Naviguate in Rstudio Formation R - https://thinkr.fr 52 / 470

Getting started with RStudio

Console

Source

Environment

Files and others

Create objects in R
(10 + 2) * 5
<-
a <- 15
a
n <- 10 + 2
n
n <- 3 * 2
n
The console Formation R - https://thinkr.fr 59 / 470

Create objects in R
a <- 3
a
#> [1] 3
A <- 9
A
#> [1] 9
#> [1] 3

Quiz
a <- 5
b <- a * 4
a <- B
a a
b B

Using an R base function
runif()
runif()
runif(n = 1)
runif(n = 1, min = -5)
runif(n = 3, max = 5)
runif(n = 1, min = -5, max = 5)
runif(n = 3, min = -5, max = 5)
Functions Formation R - https://thinkr.fr 63 / 470

About parameters
Functions Formation R - https://thinkr.fr 64 / 470

What is a package?
{proustr}
{proustr}
{proustr}
Customize R with packages Formation R - https://thinkr.fr 66 / 470

Loading a package
library(packagename)
library(proustr)
data()
{proustr} albertinedisparue
alombredesjeunesfillesenfleurs ducotedechezswann laprisonniere
lecotedeguermantes letemprepreve proust_char sodomeetgomorrhe
stop_words
data(stop_words)
Load packages Formation R - https://thinkr.fr 68 / 470

Good practice
library()
library()
Load packages Formation R - https://thinkr.fr 69 / 470

The CRAN
Install packages from CRAN Formation R - https://thinkr.fr 71 / 470

Installing a package from the CRAN
install.packages('packagename')

Installing a package from the CRAN

Exercise
draw()

Shortcuts to remember:
The main shortcuts in Rstudio Formation R - https://thinkr.fr 76 / 470

Shortcuts to remember (windows)

Shortcuts to remember (mac)

Data, what does it look like?
Data manipulation workflow Formation R - https://thinkr.fr 80 / 470

Data, what does it look like?
ibmi
consumed_quantity
age_class
food_type

Data manipulation workflow

First and foremost: graphs!

First and foremost: graphs!

Studying children's BMI
#> # A tibble: 6 × 4
#> bmi age_class food_type consumed_quanti…
#> <dbl> <fct> <chr> <dbl>
#> 1 13 7-10 years Sweets and chocolate 84.4
#> 2 13 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sav… 135.
#> 3 13 7-10 years Viennese pastries, cakes and sweet cookies 166.
The life history of a plot Formation R - https://thinkr.fr 86 / 470

Graph Creation Process







Tadaaaa ! Here are the plots you will
create

Quiz

Package {ggplot2}
{ggplot2} ? What is it ? Formation R - https://thinkr.fr 97 / 470

Construction of a graph in the form of
successive and additive layers





ggplot(data = ...) # Data

aes(x = ..., y = ...) # Aesthetic mappings
geom_...() # Geometries
facet_...() # Facets
stat_...() # Statistical elements
coord_...() # Coordinates
theme_...() # Theme

ggplot(data = ...) + # Data

aes(x = ..., y = ...) + # Aesthetic mappings
geom_...() + # Geometries
facet_...() + # Facets
stat_...() + # Statistical elements
coord_...() + # Coordinates
theme_...() # Theme

ggplot(data = ...) # Data

aes(x = ..., y = ...) # Aes. mappings


aes(x = ...,
y = ...,
color = ...) + # Aes. mappings


aes(x = ...,
y = ...,
scale_...() # Points color


aes(x = ...,
y = ...,
scale_...() + # Points color
labs(title = ..., # Titles
...)


aes(x = ...,
y = ...,
...) +
facet_...() # Facets


aes(x = ...,
y = ...,
...) +
facet_...() + # Facets
theme_...() # Theme

Quiz
ggplot(data = ..., ggplot(data = ...)

x = ..., aes(x = ..., y = ...)
y = ..., geom_...()
geom = ...)
ggplot(data = ...) +
aes(x = ..., y = ...) +
geom_...()

Data format required by {ggplot2}
data.frame
data.frame
The first steps of building a graph Formation R - https://thinkr.fr 113 / 470

Data format required by {ggplot2}
bmi
age_class
food_type
consumed_quantity
#> # A tibble: 6 × 4

The first steps of building a graph
ggplot(data = data_plot_target) +
aes(x = bmi, y = consumed_quantity) +
geom_point()

The first steps of building a graph
aes(x = bmi, y = consumed_quantity) +
geom_point()
ggplot(data = ...)
aes(...)
geom_...()

Define variables to display with aes()
aes()


color
x fill
y shape
size
alpha

Pick geometric objects with geom_*()
geom_*()

Pick geometric objects with geom_*()
geom_*()
geom_histogram()
geom_point()
geom_boxplot()
geom_density()
geom_violin()
geom_col()
geom_label()
geom_*()
geom_*()

Building example plot

Builiding example plot
#> # A tibble: 171 × 4

#> 2 13 7-10 years Sandwiches, Pizzas, Pies, Pastries and Sa… 135.
#> # … with 161 more rows
bmi
age_class
food_type
consumed_quantity

bmi
consumed_quantity
aes(
x = bmi,
y = consumed_quantity
) +
geom_point()

bmi
consumed_quantity
aes(
x = bmi,
y = consumed_quantity
) +
geom_point()

bmi
consumed_quantity
food_type
aes(
x = bmi,
y = consumed_quantity,
color = food_type
) +
geom_point()

bmi
age_class
aes(
x = bmi,
fill = age_class
) +
geom_density()

bmi
age_class
aes(
x = bmi,
fill = age_class
) +
geom_density()

Quiz
data_plot_formative
#> # A tibble: 3 × 5
#> amount_sugar amount_vit_c amount_water time location
#> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 9.58 24.7 88.2 Lunch home
#> 2 0 0 0 Lunch home
#> 3 0.57 19 48.7 Lunch home
ggplot(data = data_plot_formative) ggplot(data = data_pour_le_graph) +

aes(x = amount_sugar, y = time) aes(x = amount_sugar, y = time) +
geom_boxplot() geom_boxplot()
ggplot(data = data_plot_formative) + ggplot(data = data_plot_formative) +

aes(x = amount_peanuts, y = time) + aes(x = amount_sugar, y = time) +
geom_boxplot() geom_boxplot()

Quiz
aes()
aes(x = amount_sugar, y = amount_vit_c, size = amount_water, color =

time)
aes(x = amount_sugar, y = amount_vit_c, size = amount_water, fill = time)
aes(x = amount_sugar, y = amount_vit_c, shape = amount_water, time)

Quiz
ggplot(data) + aes(x = time, y = amount_water) + geom_violin()
ggplot(data) + aes(x = time, y = amount_sugar) + geom_violin()
ggplot(data) + aes(x = time, y = amount_sugar) + geom_boxplot()

Good programming practices
Good programming practices Formation R - https://thinkr.fr 133 / 470

as.numeric View

icannotreadthistext,ithurtsmyeyes,don'tyouthink?
a<-1
# a <- 1
# a < -1
resultat <- mean(1:10 + 26, na.rm = TRUE)
resultat=mean(1:10+26,na.rm=T)

Characteristics of R objects
class
class(1) class("mummy")
#> [1] "numeric" #> [1] "character"
class(TRUE)
#> [1] "logical"
Objects Formation R - https://thinkr.fr 137 / 470

Characteristics of R objects
dessin <- ggplot(data = iris) +

aes(x = Sepal.Length, y = Petal.Length) +
geom_point(color = "green")
class(dessin)
#> [1] "gg" "ggplot"
class(class)
#> [1] "function"

Object types

Characteristics of vectors
c()
x <- c(1, 2, 3, 4)
x2 <- c("dad", "mom")
x3 <- c(TRUE, FALSE)
x4 <- c(1, "dad", TRUE)

x4
class(x4)

Characteristics of vectors
1:10
#> [1] 1 2 3 4 5 6 7 8 9 10
seq.int()
seq.int(from = 1, to = 30, by = 2)
#> [1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Operators

Operators
height <- c(1.30, 1.55, 1.20, 1.83, 1.67)

height == 1.30
#> [1] TRUE FALSE FALSE FALSE FALSE
height + 1
#> [1] 2.30 2.55 2.20 2.83 2.67
test_height <- height > 1.6

test_height
#> [1] FALSE FALSE FALSE TRUE TRUE
!test_height
#> [1] TRUE TRUE TRUE FALSE FALSE
height != 1.20
#> [1] TRUE TRUE FALSE TRUE TRUE

Quiz
vect_1 <- 1:5

vect_2 <- vect_1 + 1
vect_1 vect_2
vect_2 vect_2

Quiz
vect_1 <- c(1, 2, 3, 4, 5)

vect_2 <- vect_1 > 2
class(vect_2) logical vect_2 TRUE
vect_2 !vect_2 TRUE

Missing values
NA
is.na()
height <- c(1.30, 1.55, NA, 1.83, 1.67)

is.na(height)
#> [1] FALSE FALSE TRUE FALSE FALSE
!is.na(height)
#> [1] TRUE TRUE FALSE TRUE TRUE
height > 1.6
#> [1] FALSE FALSE NA TRUE TRUE
height > 1.6 & !is.na(height)
#> [1] FALSE FALSE FALSE TRUE TRUE

Quiz
vect_3 <- c(1, 2, NA, 4, NA)

resultat <- is.na(vect_3)
!resultat resultat TRUE
resultat FALSE FALSE TRUE !resultat TRUE

FALSE TRUE FALSE

Type conversion rules

Type conversion
height <- c(1.30, 1.55, NA, 1.83, 1.67)
as.character() character
as.character(height)
#> [1] "1.3" "1.55" NA "1.83" "1.67"
as.numeric() numeric
height <- c(1.30, 1.55, NA, 1.83, 1.67)

not_missing <- !is.na(height)
as.numeric(not_missing)
#> [1] 1 1 0 1 1

Vector-related functions
length sum min max mean median
a <- c(3, -5, 9, 6)
length(a)
sum(a)
min(a)
max(a)
mean(a)
median(a)

Vector-related functions
height <- c(1.30, 1.55, NA, 1.83, 1.67)
na.rm = TRUE
sum(height)
#> [1] NA
sum(height, na.rm = TRUE)
#> [1] 6.35
length(height)
min(height, na.rm = TRUE)
max(height, na.rm = TRUE)
mean(height, na.rm = TRUE)
median(height, na.rm = TRUE)

Counting missing values in a vector
height <- c(1.30, 1.55, NA, 1.83, NA)
NA
TRUE
numeric TRUE 1 FALSE 0
sum(as.numeric(is.na(height)))
#> [1] 2
sum(is.na(height)) # because R sometimes nice
#> [1] 2

Quiz
height <- c(1.30, 1.55, NA, 1.83, NA)

count <- sum(is.na(height))
result <- mean(is.na(height))
result result
result result

What's the {tidyverse}?
Formation R - https://thinkr.fr 155 / 470

The {tidyverse} it's also...

The usefulness of the tidyverse packages

The {tidyverse} packages
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse

1.3.1 ──
#> ✓ ggplot2 3.3.5 ✓ purrr 0.3.4

#> ✓ tibble 3.1.6 ✓ dplyr 1.0.7
#> ✓ tidyr 1.1.4 ✓ stringr 1.4.0
#> ✓ readr 2.1.1 ✓ forcats 0.5.1
#> ── Conflicts ──────────────────────────────────────────

tidyverse_conflicts() ──
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()

Datasets in the {tidyverse}

An example with iris

An example with iris
iris
#> # A tibble: 150 × 5

#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa

Files
A file ? Formation R - https://thinkr.fr 163 / 470

Extensions
a_photo.jpg
a_message.txt
a_music.mp3

Tidying up a bit
/ /
/ /

That's all relative
C:/a_file/a_sub_file/pictures/a_pictures.jpg
pictures/a_pictures.jpg

A little birdie told me

Memento

Quiz
C:\project\study
analysis.Rmd
"C:/project/study/analyse.Rmd"
"C:/project/study/analyse"
"analysis.Rmd"
"analysis"

Quiz

Import .xls ou .xlsx Excel files
library(readxl)
conso_complement_alimentaire <- read_excel(path = "data/conso_ca_prod.xlsx")
conso_complement_alimentaire
#> # A tibble: 37 × 12
#> POPULATION NOIND periode_reference num_ligne_CA num_prod type_prod
#> <chr> <dbl> <chr> <dbl> <dbl> <chr>
#> 1 Pop1 Individu 119403801 12 mois 5711 1 Complément a…
#> 2 Pop1 Individu 121303701 12 mois 6351 1 Médicament
#> 7 Pop1 Individu 213102601 12 mois 13062 2 Non identifié
#> # … with 27 more rows, and 6 more variables: classif_reg_prod <chr>,
#> # classif_prod <chr>, pres_prod <chr>, nb_unit_prod <chr>,
#> # mode_conso_prod <chr>, nb_jours_an <dbl>
Import a xls/xlsx file with {readxl} Formation R - https://thinkr.fr 172 / 470
Quiz
read_excel(path = "data/consumption.xlsx")
read_excel(path = "data/consumption.csv")
read_csv(path = "data/consumption.xls")
read_sas(path = "data/consumption.xlsx")
Import a xls/xlsx file with {readxl} Formation R - https://thinkr.fr 173 / 470

Import data from "flat" files
, ;
Import a flat file with {readr} Formation R - https://thinkr.fr 175 / 470

The read_csv and read_csv2 functions
read_csv read_csv2
library(readr)
product <- read_csv(file = "data/conso_ca_prod.csv") # comma
indiv <- read_csv2(file = "data/conso_ca_indiv.csv") # semicolon
dim(indiv) # number of row and columns in the data.frame

names(indiv) # names of the variables in the data.frame
head(indiv) # first 6 lines of the data.frame
dplyr::glimpse(indiv) # condensed visualisation of the data
skimr::skim(indiv) # descriptive stats summary
Import a flat file with {readr} Formation R - https://thinkr.fr 176 / 470

data_habits_indiv

Import a dataset with the import button
Import files with the GUI Formation R - https://thinkr.fr 179 / 470

Import a dataset with the import button
Import files with the GUI Formation R - https://thinkr.fr 180 / 470

Control data file import
dim(dataset) # number of row and columns in the data.frame

names(dataset) # names of the variables in the data.frame
head(dataset) # first 6 lines of the data.frame
dplyr::glimpse(dataset) # condensed visualisation of the data
skimr::skim(dataset) # descriptive stats summary
Control data import in R Formation R - https://thinkr.fr 182 / 470

Quiz
# A tibble: 100 x 1
`POPULATION;NOIND;periode_reference;conso_ca;conso_ca_regl;co~
<chr>
1 Pop1 Individu;110100101;12 mois;Non;Non;NA;NA;NA;NA;NA;NA;NA;~
All good
You picked the wrong column separator
You used the wrong import function
Answer D

Quiz
# A tibble: 86 x 1
`PK\003\004\024`
<chr>
1 "\xa1\xa6"
2 "B\xa8\x10\xaaf\x91\x97\xed9\xe7\xcc\xf1d<\xbb\\9[=AB\x13|#\x8e\xa~
3 "\xcc"
All good
You picked the wrong column separator
You used the wrong import function
Answer D

Modify aes() default scale parameters
aes()
scale_*()
fill
scale_fill()
Graphs: change default variable display Formation R - https://thinkr.fr 186 / 470

#> scale_fill_viridis_b
#> scale_color_continuous
#> scale_color_gradient2
color
#> scale_fill_gradient
#> scale_colour_continuous
scale_color_*()
#> scale_colour_viridis_d
#> scale_color_viridis_b
fill scale_fill_*()
#> scale_color_viridis_c
#> scale_discrete_manual
#> scale_colour_manual
#> scale_colour_viridis_c
#> scale_size_continuous
#> scale_shape_manual
#> scale_fill_viridis_d
#> scale_alpha_manual
#> scale_fill_viridis_c
#> scale_fill_gradient2
#> scale_fill_continuous

scale_color/fill_grey() scale_color/fill_manual()
scale_color/fill_viridis_d()

scale_color_manual(values = c("coolor1", "color2", ..., "colorN"))
scale_color_manual(values = c("pink", "red", ..., "blue"))
scale_color_manual(values = c("#E697DD", "#F51414", ..., "#142DF5"))

aes(x = bmi, fill = age_class) +
geom_density()

geom_density() +
scale_fill_grey()

aes(x = bmi, y = consumed_quantity,
color = food_type) +
geom_point()

geom_point() +
scale_color_manual(
values = c("#20B8D6", "#FF9300",
"#7176B8")
)

Quiz
scale_color_viridis_d()
scale_size_viridis_d()
scale_fill_viridis_d()

Personalize geometric objects
geom_*()
color
fill
alpha
size
ggplot(data) +
aes(...) +
geom_...(color = ...)
More geometric objects ! Formation R - https://thinkr.fr 196 / 470

geom_density() +
scale_fill_grey()

geom_density(alpha = 0.8) +
scale_fill_grey()

scale_fill_grey()
alpha
aes()
geom_density()

Combining geometric objects
ggplot(data = data_plot_target) + ggplot(data = data_plot_target) +

aes(x = bmi, y = age_class) + aes(x = bmi, y = age_class) +
geom_boxplot() geom_boxplot() +
geom_point()

Combining geometric objects
aes()
aes(x = bmi, y = age_class, color = age_class) +
geom_boxplot() +
geom_point()

Combiner les objets géométriques
aes()
aes() geom_*()

geom_boxplot(aes(color = age_class)) + geom_boxplot() +
geom_point() geom_point(aes(color = age_class))

Frequent mistake

aes(x = bmi, y = consumed_quantity) + aes(x = bmi, y = consumed_quantity) +
geom_point(color = "green") geom_point(aes(color = "green"))

Quiz
geom_boxplot(color = "red") + geom_point(size = 2)
geom_boxplot(aes(color = "red")) + geom_point(size = 2)
geom_boxplot(color = "red") + geom_point(aes(size = amount_water))

The facet_grid() function
facet_grid()
rows = vars(...)
cols = vars(...)
vars()
Making sub-plots corresponding to subsets of data Formation R - https://thinkr.fr 206 / 470


geom_point() +
scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8"))


aes(x = bmi, y = consumed_quantity, aes(x = bmi, y = consumed_quantity,
color = food_type) + color = food_type) +
geom_point() + geom_point() +
scale_color_manual(values = scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class))


aes(x = bmi, y = consumed_quantity, aes(x = bmi, y = consumed_quantity,
color = food_type) + color = food_type) +
geom_point() + geom_point() +
scale_color_manual(values = scale_color_manual(values =
c("#20B8D6", "#FF9300", "#7176B8")) c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(rows = vars(age_class))

Quiz
facet_grid
facet_grid(rows = vars(time, location))
facet_grid(rows = vars(time))
facet_grid(rows = vars(time), cols = vars(location))

Hands-on Practical

The labs() function
labs()
title
subtitle
color fill size alpha

aes()
caption
Modify title labels Formation R - https://thinkr.fr 214 / 470


scale_fill_grey()

scale_fill_grey() +
labs(
title = "Children BMI by age class",
x = "BMI",
y = "Density",
fill = "Age class"
)

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8"))

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(
title = "Consumption of fat/sweet
foods according to children BMI",
x = "BMI",
y = "Average consumption by\nchild
during study (in g)",
color = "Food type"
)
\n y = "..."

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class))

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class)) +
labs(
title = "",
subtitle = "By age class",
x = "BMI",
color = "Food type",
caption = "NB: BMI (Body Mass Index)
= weigth / (height ^ 2)"
)

Quiz
labs()
labs(title = "Sugars versus Vitamin C", color = "Reference: INCA3 study")
labs(subtitle = "Sugars versus Vitamin C", caption = "Reference: INCA3

study")
labs(title = "Sugars versus Vitamin C", caption = "Reference: INCA3

study")

Practical

The coord_flip() function
Play with the coordinate system to flip your grah upside down Formation R - https://thinkr.fr 225 / 470
The coord_flip() function

geom_boxplot() geom_boxplot() +
coord_flip()
Play with the coordinate system to flip your grah upside down Formation R - https://thinkr.fr 226 / 470

The theme_*() functions
#> [1] "theme_bw" "theme_classic" "theme_dark" "theme_gray"

#> [5] "theme_grey" "theme_light" "theme_linedraw" "theme_minimal"
#> [9] "theme_test" "theme_void"
Customize the graph theme Formation R - https://thinkr.fr 228 / 470

The theme_*() functions
#> [1] "theme_base" "theme_calc" "theme_clean"

#> [4] "theme_economist" "theme_economist_white" "theme_excel"
#> [7] "theme_excel_new" "theme_few" "theme_fivethirtyeight"
#> [10] "theme_foundation" "theme_gdocs" "theme_hc"
#> [13] "theme_igray" "theme_map" "theme_pander"
#> [16] "theme_par" "theme_solarized" "theme_solarized_2"
#> [19] "theme_solid" "theme_stata" "theme_tufte"
#> [22] "theme_wsj"

Building example graph

scale_fill_grey() +
labs(title = "Children BMI by age
class",
x = "BMI",
y = "Density",
fill = "Age class")

scale_fill_grey() +
labs(title = "Children BMI by age
class",
x = "BMI",
y = "Density",
fill = "Age class") +
theme_few() # du package {ggthemes}

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Fat/sweet food
consumption vs children's BMI",
x = "BMI",
color = "Food type")

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
x = "BMI",
color = "Food type")+
theme_few()

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
facet_grid(cols = vars(age_class)) +
labs(
title = "",
subtitle = "By age class",
x = "BMI",
color = "Food type",
caption = "NB: BMI (Body Mass Index)
= weigth / (height ^ 2)"
) +
theme_few()

Bonus: towards a finer customization
theme()
theme_*()
theme(legend.position = "bottom")
guides(color = guide_legend(ncol = 1))

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
x = "BMI",
color = "Food type") +
theme_few()

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
x = "BMI",
theme_few() +
theme(legend.position = "bottom")

geom_point() +
c("#20B8D6", "#FF9300", "#7176B8")) +
x = "BMI",
theme_few() +
theme(legend.position = "bottom") +

The ggsave() function
ggsave(filename = ..., plot = ...)
filename
plot
Export a graph Formation R - https://thinkr.fr 241 / 470

Assignation
plot_alim_bmi <- ggplot(data = data_plot_target) +

aes(x = bmi, y = consumed_quantity, color = food_type) +
geom_point() +
scale_color_manual(values = c("#20B8D6", "#FF9300", "#7176B8")) +
labs(title = "Consumption of fat/sweet foods according to children BMI",
x = "BMI",
y = "Average consumption by\nchild during study (in g)",
theme_few() +
theme(legend.position = "bottom") +
plot_alim_bmi

Assignation
plot_alim_bmi

Export
ggsave(filename = "graph_alim_bmi.png", plot = plot_alim_bmi)

Hands-on Practical

{dplyr} What is it?
Wrangle data within tidyverse using {dplyr} Formation R - https://thinkr.fr 247 / 470
Manipulate a data.frame
data_food
#> # A tibble: 6 × 3
#> occasion_type food amount
#> <chr> <chr> <dbl>
#> 1 Aperitif before lunch tap water 148.
#> 2 Lunch fruit juice 100% pure juice 500
#> 3 Lunch n.s. cooking fat 3.22
#> 4 Lunch potato fries 107
#> 5 Lunch chicken nugget 161
#> 6 Lunch n.s. fat 12.8
occasion_type
food
amount
data_food
#> # A tibble: 6 × 3
data_food
#> # A tibble: 6 × 3
data_food
occasion_type "Lunch"
amount_kg amount
food
quantite_consommee_kg
Chain operations in {dplyr}
%>%
Chain operations in {dplyr}
data_food
#> # A tibble: 6 × 3
# in pseudo code
data_food %>%
filter_lunch %>%
create_column_amount_kg %>%
group_by_food %>%
create_mean_amount_kg
Exercise
data_food
#> # A tibble: 6 × 3

Explore rows of a dataset
data_food
#> # A tibble: 6 × 3

Explore rows of a dataset
data_food
#> # A tibble: 6 × 3
"Lunch" occasion_type
food

The {dplyr} functions handle rows
arrange()
filter()
count()

Rearrange rows with arrange()

desc()
your_dataframe %>%
arrange(sorting_variable_1, sorting_variable_2)

data_food %>%
arrange(amount)
#> # A tibble: 15 × 3
#> 1 Breakfast stevia 0.025
#> 2 In the morning aspartame 0.0300
#> 3 Lunch aspartame 0.0300
#> 4 Snack aspartame 0.0300
#> 5 Dinner aspartame 0.0300
#> 6 In the evening/night aspartame 0.0300
#> 11 Dinner olive oil 0.0448
#> 12 Lunch olive oil 0.048
#> 13 Lunch olive oil 0.0500
#> 14 Lunch stevia 0.0500
#> 15 Snack stevia 0.0500

data_food %>%
arrange(desc(amount))
#> # A tibble: 15 × 3
#> 1 Dinner blond beer -2% 1768.
#> 2 Aperitif before dinner beer with peach sirup 1496.
#> 3 Lunch meat soup 1481.
#> 4 Lunch pistou soup 1481.
#> 5 Lunch meat stock 1467.
#> 6 Dinner meat stock 1467.
#> 7 Dinner croque madame 1443.
#> 8 In the afternoon (excluding snacks) fruit punch 1354.
#> 9 Lunch chinese soup 1185.
#> 10 Dinner meat stock 1173.
#> 11 Dinner stew stock 1173.
#> 12 Dinner vegetable stock 1167.
#> 13 Dinner fajita 1100
#> 14 Dinner vegetable soup 1050.
#> 15 Dinner diluted fruit juice 1024.

data_food %>%
arrange(occasion_type, desc(amount))
#> # A tibble: 15 × 3
#> 2 Aperitif before dinner wine-based cocktail 1014.
#> 3 Aperitif before dinner blond beer 2-4.9% 1010
#> 4 Aperitif before dinner n.s. blond beer 1010
#> 5 Aperitif before dinner blond beer 2-4.9% 1010
#> 6 Aperitif before dinner non-aromatised still water 1000
#> 7 Aperitif before dinner tap water 1000
#> 11 Aperitif before dinner n.s. still water 1000
#> 13 Aperitif before dinner fruit punch 903.
#> 14 Aperitif before dinner blond beer 5-7.9% 864.

Filter rows with filter()

> < <= >=

%in% & | !
your_dataframe %>%
filter(condition)

# "Aperitif before lunch" in the occasion_type column
data_food %>%
filter(occasion_type == "Aperitif before lunch")
#> # A tibble: 15 × 3
#> 2 Aperitif before lunch tap water 333
#> 3 Aperitif before lunch soda with lemoin extract like sprite 178.
#> 4 Aperitif before lunch non-aromatised still water 125
#> 9 Aperitif before lunch champagne brut 135
#> 11 Aperitif before lunch grilled peanut 100
#> 12 Aperitif before lunch olive n.s. 18
#> 13 Aperitif before lunch pastis 285
#> 14 Aperitif before lunch potato chips 24
#> 15 Aperitif before lunch green olive 12

# "Dinner" in occasion_type column AND "tap water" in food column
data_food %>%
filter(occasion_type == "Dinner" & food == "tap water")
#> # A tibble: 15 × 3
#> 1 Dinner tap water 221.
#> 4 Dinner tap water 210
#> 9 Dinner tap water 62.5

# "Dinner" in occasion_type column OR "tap water" in food column
data_food %>%
filter(occasion_type == "Dinner" | food == "tap water")
#> # A tibble: 15 × 3
#> 2 Dinner fruit yoghurt 125
#> 3 Dinner tomato sauce 102.
#> 4 Dinner dow 300
#> 5 Dinner white bread 29.4
#> 7 In the evening/night tap water 148.
#> 8 In the evening/night tap water 148.
#> 9 Lunch tap water 360
#> 10 Dinner red cabbage 65
#> 11 Dinner white salt 1
#> 12 Dinner beef bifteck 153
#> 13 Dinner salad dressing with wine vinegar 4.28
#> 14 Dinner green beans 30
#> 15 Dinner white bread 31.5

%in% |
data_food %>%
filter(occasion_type %in% c("Lunch", "Dinner"))
data_food %>%
filter(occasion_type == "Lunch" | occasion_type == "Dinner")

data_food %>%
filter(occasion_type == "Lunch") %>%
arrange(desc(amount)) %>%
head()
#> # A tibble: 6 × 3
#> 1 Lunch meat soup 1481.
#> 2 Lunch pistou soup 1481.
#> 3 Lunch meat stock 1467.
#> 4 Lunch chinese soup 1185.
#> 5 Lunch non-aromatised still water 1000
#> 6 Lunch tap water 1000

Quiz
occasion_type
data_food
data_food %>% data_food %>%

filter(occasion_type != NA) filter(is.na(occasion_type) = FALSE)

arrange(desc(occasion_type)) filter(!is.na(occasion_type))

A few particular filters: distinct()
unique()
data_food %>%
distinct()
#> # A tibble: 68,041 × 3

#> 7 Lunch hamburger 106
#> 8 In the afternoon (excluding snacks) chewing gum 1.4
#> 9 In the afternoon (excluding snacks) water with mint sirup 447.
#> 10 Dinner fruit yoghurt 125
#> # … with 68,031 more rows

unique()
data_food %>%
distinct()
distinct()
occasion_type food amount

data_food %>%
distinct(occasion_type)
#> # A tibble: 10 × 1
#> occasion_type
#> <chr>
#> 1 Aperitif before lunch
#> 2 Lunch
#> 3 In the afternoon (excluding snacks)
#> 4 Dinner
#> 5 In the evening/night
#> 6 Breakfast
#> 7 Aperitif before dinner
#> 8 In the morning
#> 9 Snack
#> 10 Before breakfast

A few particular filters: slice_sample() ...
data_food %>%
slice_sample(n = 10)
#> # A tibble: 10 × 3
#> 1 Dinner n.s. cooking fat 5
#> 2 Dinner yaourt avec fruits 125
#> 3 Breakfast tartine craquante au froment (classique) type cracotte 42
#> 4 Lunch compote (de fruits) 90
#> 5 Lunch cuisse de canard 122.
#> 6 Lunch vin rouge 120
#> 7 Dinner salade batavia 15
#> 8 Dinner non-aromatised still water 162.
#> 9 Dinner nem au porc 30

... and slice_sample(prop = ...)
data_food %>%
slice_sample(prop = 0.05) # sample 5% of all rows
#> # A tibble: 12,815 × 3

#> 1 Dinner huile de pépins de raisin 13.1
#> 2 Lunch sucre blanc 3
#> 3 Lunch non-aromatised still water 402.
#> 4 Snack eau minérale plate n.s. 80
#> 5 Dinner unsalted butter 5.5
#> 6 Dinner sel marin gris type noirmoutier/guérande 1
#> 7 Snack gâteau moelleux 30
#> 8 Dinner tomato sauce 15.9
#> 9 Lunch jambon cuit sans couenne 45
#> 10 Snack gaufrette fourrée aux fruits type paille d'or 43

A few particular filters: slice_max()
n
data_food %>%
slice_max(amount, n = 2)
#> # A tibble: 2 × 3
#> 1 Dinner blond beer -2% 1768.

A few particular filters: slice_min()
n
data_food %>%
slice_min(amount, n = 2)
#> # A tibble: 10 × 3
#> 1 Breakfast stevia 0.025
#> 5 Dinner aspartame 0.0300

Count rows with count()
data_food %>%
count()
#> # A tibble: 1 × 1
#> n
#> <int>
#> 1 256301
count()

data_food %>%
count(occasion_type)
#> # A tibble: 10 × 2
#> occasion_type n
#> <chr> <int>
#> 1 Aperitif before dinner 3967
#> 2 Aperitif before lunch 2181
#> 3 Before breakfast 2165
#> 4 Breakfast 40195
#> 5 Dinner 73760
#> 6 In the afternoon (excluding snacks) 12906
#> 7 In the evening/night 8501
#> 8 In the morning 10443
#> 9 Lunch 85188
#> 10 Snack 16995

name
data_food %>%
count(occasion_type, name = "number")
#> # A tibble: 10 × 2
#> occasion_type number
#> <chr> <int>
#> 1 Aperitif before dinner 3967
#> 2 Aperitif before lunch 2181
#> 3 Before breakfast 2165
#> 4 Breakfast 40195
#> 5 Dinner 73760
#> 6 In the afternoon (excluding snacks) 12906
#> 7 In the evening/night 8501
#> 8 In the morning 10443
#> 9 Lunch 85188
#> 10 Snack 16995

data_food
#> # A tibble: 6 × 3

data_food %>%
filter(occasion_type == "Lunch") %>%
count(food, name = "number") %>%
arrange(desc(number)) %>%
head()
#> # A tibble: 6 × 2
#> food number
#> <chr> <int>
#> 1 tap water 6022
#> 2 white bread 4599
#> 3 non-aromatised still water 2597
#> 4 olive oil 1668
#> 5 dow 1462
#> 6 white salt 1355

Manipulate the variables of a dataset
data_food
#> # A tibble: 6 × 5
#> occasion_type occasion_location food_type food amount
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 Aperitif before lunch Home water tap … 148.
#> 2 Lunch Home vegetable and fruit juice frui… 500
#> 3 Lunch Home animal fat n.s.… 3.22
#> 4 Lunch Home potatoes and other tubers pota… 107
#> 5 Lunch Home Meet based dish chic… 161
#> 6 Lunch Home plant-based fat n.s.… 12.8
occasion_type
occasion_location
food_type
food
amount

Manipulate the variables of a dataset
data_food
#> # A tibble: 6 × 5

The {dplyr} to manipulate columns
select()
mutate()
rename()

Select columns with select()

your_dataframe %>%
select(variable_to_keep_1, variable_to_keep_2, ...)
data_food %>%
select(food, amount)
#> # A tibble: 6 × 2
#> food amount
#> <chr> <dbl>
#> 1 tap water 148.
#> 2 fruit juice 100% pure juice 500
#> 3 n.s. cooking fat 3.22
#> 4 potato fries 107
#> 5 chicken nugget 161
#> 6 n.s. fat 12.8

starts_with()
ends_with()
contains()
everything()

data_food %>%
select(-occasion_type, -occasion_location)
#> # A tibble: 256,301 × 3

#> food_type food amount
#> 1 water tap water 148.
#> 2 vegetable and fruit juice fruit juice 100… 500
#> 3 animal fat n.s. cooking fat 3.22
#> 4 potatoes and other tubers potato fries 107
#> 5 Meet based dish chicken nugget 161
#> 6 plant-based fat n.s. fat 12.8
#> 7 Sandwiches, pizzas, pies, pastries and savory cookies hamburger 106
#> 8 Sweets and chocolate chewing gum 1.4
#> 9 Soft drinks water with mint… 447.
#> 10 Yoghurts and white cheeses fruit yoghurt 125

data_food %>%
select(starts_with("occasion"))
#> # A tibble: 256,301 × 2

#> occasion_type occasion_location
#> <chr> <chr>
#> 1 Aperitif before lunch Home
#> 2 Lunch Home
#> 3 Lunch Home
#> 4 Lunch Home
#> 5 Lunch Home
#> 6 Lunch Home
#> 7 Lunch Home
#> 8 In the afternoon (excluding snacks) Home
#> 9 In the afternoon (excluding snacks) Home
#> 10 Dinner Home

data_food %>%
select(-starts_with("occasion"))
#> # A tibble: 256,301 × 3


data_food %>%
select(-ends_with("type"))
#> # A tibble: 256,301 × 3

#> occasion_location food amount
#> 1 Home tap water 148.
#> 2 Home fruit juice 100% pure juice 500
#> 3 Home n.s. cooking fat 3.22
#> 4 Home potato fries 107
#> 5 Home chicken nugget 161
#> 6 Home n.s. fat 12.8
#> 7 Home hamburger 106
#> 8 Home chewing gum 1.4
#> 9 Home water with mint sirup 447.
#> 10 Home fruit yoghurt 125

data_food %>%
select(-contains("occasion"))
#> # A tibble: 256,301 × 3


data_food %>%
select(food, everything())
#> # A tibble: 256,301 × 5

#> food occasion_type occasion_locati… food_type amount
#> 1 tap water Aperitif befo… Home water 148.
#> 2 fruit juice 100% pure juice Lunch Home vegetable… 500
#> 3 n.s. cooking fat Lunch Home animal fat 3.22
#> 4 potato fries Lunch Home potatoes … 107
#> 5 chicken nugget Lunch Home Meet base… 161
#> 6 n.s. fat Lunch Home plant-bas… 12.8
#> 7 hamburger Lunch Home Sandwiche… 106
#> 8 chewing gum In the aftern… Home Sweets an… 1.4
#> 9 water with mint sirup In the aftern… Home Soft drin… 447.
#> 10 fruit yoghurt Dinner Home Yoghurts … 125

Quiz
occasion
data_food

filter(starts_with("occasion")) select(starts_with("occasion"))
data_food %>%
starts_with("occasion")

Quiz
#> # A tibble: 6 × 5
data_food %>%
select(food) %>%
select(amount)
data_food food
data_food food data_food amount

amount

Transform or create a column with
mutate()

mutate() to transform or create variables
your_dataframe %>%
mutate(new_variable_1 = operations(existing_variable_2),
new_variable_3 = operations(existing_variable_4),
...
)

{dplyr}
lag() lead()
cumsum() cumprod()
+ - * > < <= >=
ifelse() case_when()

tibble(
hour = 12:18,
food_intake = c(280, 25, 0, 0, 100, 50, 200)
) %>%
mutate(
lag_hour = lag(hour),
diff_intake = food_intake - lag(food_intake),
cum_intake = cumsum(food_intake)
)
#> # A tibble: 7 × 5
#> hour food_intake lag_hour diff_intake cum_intake
#> <int> <dbl> <int> <dbl> <dbl>
#> 1 12 280 NA NA 280
#> 2 13 25 12 -255 305
#> 3 14 0 13 -25 305
#> 4 15 0 14 0 305
#> 5 16 100 15 100 405
#> 6 17 50 16 -50 455
#> 7 18 200 17 150 655

data_food %>%
mutate(amount_kg = amount / 1000,
amount_kg = round(amount_kg, digits = 2)) %>%
select(amount_kg, amount_kg) %>%
head()
#> # A tibble: 6 × 1
#> amount_kg
#> <dbl>
#> 1 0.15
#> 2 0.5
#> 3 0
#> 4 0.11
#> 5 0.16
#> 6 0.01

case_when()
ifelse mutate case_when()
condition ~ result
your_dataframe %>%
mutate(
variable = case_when(
condition_1 ~ value_1,
condition_2 ~ value_2,
...
))
mutate() variable

case_when()
data_food %>%
mutate(
amount_chr = case_when(
amount > 400 ~ "a gigantic quantity",
amount > 100 ~ "a lot",
amount >= 0 ~ "a small amount"
)
) %>%
select(amount, amount_chr)
#> # A tibble: 7 × 2
#> amount amount_chr
#> <dbl> <chr>
#> 1 148. a lot
#> 2 500 a gigantic quantity
#> 3 3.22 a small amount
#> 4 107 a lot
#> 5 161 a lot
#> 6 12.8 a small amount
#> 7 106 a lot

Rename variables with rename()
your_dataframe %>%
rename(new_name = old_name)
data_food data_food %>%

rename(place = occasion_location)
#> # A tibble: 6 × 2
#> occasion_location amount #> # A tibble: 6 × 2
#> <chr> <dbl> #> place amount
#> 1 Home 148. #> <chr> <dbl>
#> 2 Home 500 #> 1 Home 148.
#> 3 Home 3.22 #> 2 Home 500
#> 4 Home 107 #> 3 Home 3.22
#> 5 Home 161 #> 4 Home 107
#> 6 Home 12.8 #> 5 Home 161
#> 6 Home 12.8

Quiz
#> # A tibble: 4 × 5
amount quantity

rename(amount = quantity) mutate(quantity = amount)

select(quantity = amount) rename(quantity = amount)

Extract the content of a column with
pull()
data_food %>%
select(amount) %>%
class()
#> [1] "tbl_df" "tbl" "data.frame"
data_food %>%
pull(amount) %>%
class()
#> [1] "numeric"

Functions to summarise data
mean() median()
n() summarise()
var() sd()
min() max()

Summarise data with summarise()
your_dataframe %>%
summarise(
var1_summary = function_1(variable_1),
var2_summary = function_2(variable_2),
...
)

data_food
#> # A tibble: 6 × 3

data_food %>%
summarise(
mean_summary = mean(amount),
variance_summary = var(amount),
number_summary = n()
)
#> # A tibble: 1 × 3
#> mean_summary variance_summary number_summary
#> <dbl> <dbl> <int>
#> 1 NA NA 256301

data_food %>%
summarise(
mean_summary = mean(amount, na.rm = TRUE),
variance_summary = var(amount, na.rm = TRUE),
number_summary = n()
)
#> # A tibble: 1 × 3
#> mean_summary variance_summary number_summary
#> <dbl> <dbl> <int>
#> 1 118. 17515. 256301

The adverbial complement group_by()
your_dataframe %>%
group_by(grouping_variable_1, grouping_variable_2, ...)

Chain group_by() and summarise()
group_by() summarise()

Chain group_by() and summarise()
data_food %>%
group_by(occasion_type) %>%
summarise(
mean_amount = mean(amount, na.rm = TRUE),
variance_amount = var(amount, na.rm = TRUE),
number = n(),
.groups = "drop"
)
#> # A tibble: 8 × 4
#> occasion_type mean_amount variance_amount number
#> <chr> <dbl> <dbl> <int>
#> 1 Aperitif before dinner 132. 20376. 3967
#> 2 Aperitif before lunch 121. 16260. 2181
#> 3 Before breakfast 159. 15585. 2165
#> 4 Breakfast 127. 22502. 40195
#> 5 Dinner 115. 18683. 73760
#> 6 In the afternoon (excluding snacks) 153. 18741. 12906
#> 7 In the evening/night 162. 18286. 8501
#> 8 In the morning 142. 16345. 10443
.groups = "drop"

Exercise
data_physical_activity
#> # A tibble: 50 × 3
#> region gender time_physical_activity_hours
#> 1 Aquitaine autre 2.41
#> 2 Brittany M 4.40
#> 3 Normandy M 5.78
#> 4 Aquitaine F 0.353
#> 5 Burgondy M 1.86
#> 6 Burgondy autre 3.33
#> 7 Normandy F 1.23
#> 9 Normandy M 5.04

Exercise
data_physical_activity %>%
filter(region == "Bretagne") %>%
select(-region) %>%
mutate(
time_physical_activity_hours = round(time_physical_activity_hours)
) %>%
group_by(gender) %>%
summarise(mean_time_phys_act_hours = mean(time_physical_activity_hours),
.groups = "drop") %>%
arrange(mean_time_phys_act_hours)

Exercise
data_physical_activity %>%
group_by(gender) %>%
slice_sample(n = 10) %>%
mutate(time_physical_activity_minutes = time_physical_activity_hours * 60) %>%
slice_max(time_physical_activity_minutes, n = 3) %>%
summarise(
mean_time_h = mean(time_physical_activity_hours),
median_time_h = median(time_physical_activity_hours),
mean_time_min = mean(time_physical_activity_minutes),
median_time_min = median(time_physical_activity_minutes),
.groups = "drop"
)

Tidy data
Tidy data Formation R - https://thinkr.fr 321 / 470

Explanations
#> # A tibble: 8 × 2
#> age gender
#> <dbl> <chr>
#> 1 25 male
#> 2 45 male
#> 3 31 female
#> 4 10 male
#> 5 23 male
#> 6 43 male
#> 7 45 female
#> 8 12 male

The statistical individual
The statistical individuals are lines while

variable are columns

"Non-tidy" data versus "Tidy" data
#> # A tibble: 5 × 5
#> age_class water carbs lipids proteins
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 7-10 year 759. 390. 422. 275.
#> 2 11-17 year 942. 417. 435. 143.
#> 3 18-35 year 1026. 529. 537. 188.
#> 4 36-55 year 734. 635. 577. 151.
#> 5 56 year + 883. 590. 368. 258.

"Non-tidy" data versus "Tidy" data
#> # A tibble: 5 × 5 #> # A tibble: 20 × 3

#> age_class water carbs lipids proteins #> age_class nutrient avg_intake
#> <chr> <dbl> <dbl> <dbl> <dbl> #> <chr> <chr> <dbl>
#> 1 7-10 year 759. 390. 422. 275. #> 1 7-10 year water 759.
#> 2 11-17 year 942. 417. 435. 143. #> 2 7-10 year carbs 390.
#> 3 18-35 year 1026. 529. 537. 188. #> 3 7-10 year lipids 422.
#> 4 36-55 year 734. 635. 577. 151. #> 4 7-10 year proteins 275.
#> 5 56 year + 883. 590. 368. 258. #> 5 11-17 year water 942.
#> 6 11-17 year carbs 417.
#> 7 11-17 year lipids 435.
#> 8 11-17 year proteins 143.
#> 9 18-35 year water 1026.
#> 10 18-35 year carbs 529.
#> 11 18-35 year lipids 537.
#> 13 36-55 year water 734.
#> 14 36-55 year carbs 635.
#> 15 36-55 year lipids 577.
#> 17 56 year + water 883.
#> 18 56 year + carbs 590.
#> 19 56 year + lipids 368.
#> 20 56 year + proteins 258.

Quiz
data_a data_b data_c
#> # A tibble: 4 × 2 #> # A tibble: 4 × 3 #> # A tibble: 4 × 4

#> id information #> id gender age #> id man woman age
#> <int> <chr> #> <int> <chr> <chr> #> <int> <dbl> <dbl> <chr>
#> 1 1 34 years old man #> 1 1 man 34 #> 1 1 1 0 34
#> 2 2 23 years old woman #> 2 2 woman 23 #> 2 2 0 1 23
#> 3 3 12 years old man #> 3 3 man 12 #> 3 3 1 0 12
#> 4 4 13 years old woman #> 4 4 woman 13 #> 4 4 0 1 13
data_a
data_b
data_c

Exercise

Exercise

Exercise

The one-million dollar question
Of the importance of data format Formation R - https://thinkr.fr 331 / 470


#> # A tibble: 10,002 × 3 #> # A tibble: 3,917 × 6

#> NOIND food_type amount #> NOIND Milk Drinks Bread Juice Croissants
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 Milk 250. #> 1 110100101 250. 0 0 0 0
#> 2 110100701 Drinks 1750. #> 2 110100701 0 1750. 72 0 0
#> 3 110100701 Bread 72 #> 3 110100801 458. 0 103. 0 0
#> 4 110100801 Milk 458. #> 4 110101201 8 654. 0 255. 55
#> 5 110100801 Bread 103. #> 5 110101401 0 1342. 127. 0 0
#> 6 110101201 Drinks 654. #> 6 110300301 15.6 1217. 0 0 0
#> 7 110101201 Juice 255. #> 7 110300501 77.8 800 0 0 0
#> 8 110101201 Milk 8 #> 8 110600101 93.0 1283. 177. 0 0
#> 9 110101201 Croissants 55 #> 9 110601301 46.9 587. 16 0 0
#> 10 110101401 Drinks 1342. #> 10 110602001 0 1150. 220. 125. 0
#> # … with 9,992 more rows #> # … with 3,907 more rows

#> # A tibble: 10,002 × 3

#> NOIND food_type amount
#> <dbl> <chr> <dbl>
#> 1 110100101 Milk 250.
#> 2 110100701 Drinks 1750.
#> 3 110100701 Bread 72
#> 4 110100801 Milk 458.
#> 5 110100801 Bread 103.
#> 6 110101201 Drinks 654.
#> 7 110101201 Juice 255.
#> 8 110101201 Milk 8
#> 9 110101201 Croissants 55
#> 10 110101401 Drinks 1342.

#> # A tibble: 3,917 × 6

#> NOIND Milk Drinks Bread Juice Croissants
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 250. 0 0 0 0
#> 2 110100701 0 1750. 72 0 0
#> 3 110100801 458. 0 103. 0 0
#> 4 110101201 8 654. 0 255. 55
#> 5 110101401 0 1342. 127. 0 0
#> 6 110300301 15.6 1217. 0 0 0
#> 7 110300501 77.8 800 0 0 0
#> 8 110600101 93.0 1283. 177. 0 0
#> 9 110601301 46.9 587. 16 0 0
#> 10 110602001 0 1150. 220. 125. 0


Widen a dataset with pivot_wider()
Widen a dataset with {tidyr} Formation R - https://thinkr.fr 338 / 470

food_intake <- tibble(

id = c(110100101, 110100101, 110100101, 110100701, 110100801, 110100801),
food_type = c("Water", "Juice", "Milk", "Water", "Water", "Milk"),
amount = c(1632.8, 1420.7, 250.9, 3082.5, 1500.0, 458.1)
)
food_intake
#> # A tibble: 6 × 3
#> id food_type amount
#> 1 110100101 Water 1633.
#> 2 110100101 Juice 1421.
#> 3 110100101 Milk 251.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.

pivot_wider()

food_intake %>%
pivot_wider(
# name of the column to widen
names_from = food_type,
# name of the column that contains the values
values_from = amount
)

# food_intake %>%
# pivot_wider(
# names_from = food_type,
# values_from = amount
food_intake )
#> # A tibble: 6 × 3 #> # A tibble: 3 × 4

#> id food_type amount #> id Water Juice Milk
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 Water 1633. #> 1 110100101 1633. 1421. 251.
#> 2 110100101 Juice 1421. #> 2 110100701 3082. NA NA
#> 3 110100101 Milk 251. #> 3 110100801 1500 NA 458.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.

values_fill
values_fill values_fill
# food_intake %>%
# pivot_wider(
# names_from = food_type,
food_intake %>% values_from = amount,
pivot_wider( values_fill = list(
names_from = food_type, amount = 0
values_from = amount )
) )
#> # A tibble: 3 × 4 #> # A tibble: 3 × 4

#> id Water Juice Milk #> id Water Juice Milk
#> <dbl> <dbl> <dbl> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110100101 1633. 1421. 251. #> 1 110100101 1633. 1421. 251.
#> 2 110100701 3082. NA NA #> 2 110100701 3082. 0 0
#> 3 110100801 1500 NA 458. #> 3 110100801 1500 0 458.

#> # A tibble: 6 × 3
#> id food_type amount
#> 1 110100101 Water 1633.
#> 2 110100101 Juice 1421.
#> 3 110100101 Milk 251.
#> 4 110100701 Water 3082.
#> 5 110100801 Water 1500
#> 6 110100801 Milk 458.
food_intake %>%
pivot_wider(
names_from = contains("groupe"),
values_from = contains("quantite")
)

Quiz
pivot_wider()
#> # A tibble: 1,841 × 3 #> # A tibble: 1,529 × 4

#> id product_type amount #> id vitamins blend plants
#> <dbl> <chr> <dbl> #> <dbl> <dbl> <dbl> <dbl>
#> 1 110300601 vitamins 3 #> 1 110300601 3 NA NA
#> 2 110600401 vitamins 3 #> 2 110600401 3 NA NA
#> 3 110601301 vitamins 4 #> 3 110601301 4 NA NA
#> 4 110601801 vitamins 21 #> 4 110601801 21 NA NA
#> 5 110604501 blend 1 #> 5 110604501 NA 1 NA
#> 6 110604901 plants 4 #> 6 110604901 1 NA 4
names_from = amount, values_from = product_type
names_from = product_type, values_from = amount
names_from = product_type, values_from = amount, values_fill = list(amount = 0)

Lengthen a dataset with pivot_longer()
Lengthen a dataset with {tidyr} Formation R - https://thinkr.fr 347 / 470

intake_vitamins <- tibble(

id = c(110100101, 110100701, 110100801, 110101201, 110101401, 110300301),
gender = c("Man", "Woman", "Man", "Man", "Woman", "Man"),
vitamin_c = c(506.2, 457.7, 192.9, 603.8, 161.3, 91.7),
vitamin_d = c(9.7, 19.6, 10.2, 20.5, 12.8, 10.0)
)
intake_vitamins
#> # A tibble: 6 × 4
#> id gender vitamin_c vitamin_d
#> <dbl> <chr> <dbl> <dbl>
#> 1 110100101 Man 506. 9.7
#> 2 110100701 Woman 458. 19.6
#> 3 110100801 Man 193. 10.2
#> 4 110101201 Man 604. 20.5
#> 5 110101401 Woman 161. 12.8
#> 6 110300301 Man 91.7 10

pivot_longer()

intake_vitamins %>%
pivot_longer(
# columns to gather
cols = c(vitamin_c, vitamin_d),
# name of the column that's going to contain the former columns names
names_to = "vitamin",
# name of the column that's going to contain the former columns values
values_to = "amount"
)

# intake_vitamins %>%
# pivot_longer(
# cols = c(vitamin_c, vitamin_d),
# names_to = "vitamin",
# values_to = "amount"
intake_vitamins )
#> # A tibble: 6 × 4 #> # A tibble: 12 × 4

#> id gender vitamin_c vitamin_d #> id gender vitamin amount
#> <dbl> <chr> <dbl> <dbl> #> <dbl> <chr> <chr> <dbl>
#> 1 110100101 Man 506. 9.7 #> 1 110100101 Man vitamin_c 506.
#> 2 110100701 Woman 458. 19.6 #> 2 110100101 Man vitamin_d 9.7
#> 3 110100801 Man 193. 10.2 #> 3 110100701 Woman vitamin_c 458.
#> 4 110101201 Man 604. 20.5 #> 4 110100701 Woman vitamin_d 19.6
#> 5 110101401 Woman 161. 12.8 #> 5 110100801 Man vitamin_c 193.
#> 6 110300301 Man 91.7 10 #> 6 110100801 Man vitamin_d 10.2
#> 7 110101201 Man vitamin_c 604.
#> 8 110101201 Man vitamin_d 20.5
#> 9 110101401 Woman vitamin_c 161.
#> 10 110101401 Woman vitamin_d 12.8
#> 11 110300301 Man vitamin_c 91.7
#> 12 110300301 Man vitamin_d 10

#> # A tibble: 6 × 4
#> id gender vitamin_c vitamin_d
#> <dbl> <chr> <dbl> <dbl>
#> 1 110100101 Man 506. 9.7
#> 2 110100701 Woman 458. 19.6
#> 3 110100801 Man 193. 10.2
#> 4 110101201 Man 604. 20.5
#> 5 110101401 Woman 161. 12.8
#> 6 110300301 Man 91.7 10
intake_vitamins %>% intake_vitamins %>%

pivot_longer( pivot_longer(
cols = starts_with("vitamin"), cols = -c(id, gender),
names_to = "vitamin", names_to = "vitamin",
values_to = "amount" values_to = "amount"
) )

Quiz
pivot_longer()
#> # A tibble: 4,725 × 4 #> # A tibble: 14,175 × 3

#> id video_game computer tv #> id screen_type time
#> <dbl> <dbl> <dbl> <dbl> #> <dbl> <chr> <dbl>
#> 1 120100401 0 0 1.64 #> 1 120100401 video_game 0
#> 2 120100501 0.964 0.571 0.821 #> 2 120100401 computer 0
#> 3 120100601 0 0.571 1.14 #> 3 120100401 tv 1.64
#> 4 120100801 0.357 0.357 1 #> 4 120100501 video_game 0.964
#> 5 120100901 1 0.286 2.57 #> 5 120100501 computer 0.571
#> 6 120101001 0.214 1.21 1.43 #> 6 120100501 tv 0.821
#> 7 120101201 0 0.786 1.14 #> 7 120100601 video_game 0
#> 8 120101301 1.5 0.25 1.5 #> 8 120100601 computer 0.571
#> 9 120200401 0 2.73 1 #> 9 120100601 tv 1.14
#> 10 120300101 0 5.57 4 #> 10 120100801 video_game 0.357
cols = c(video_game, computer, tv), names_to = "time", values_to = "screen_type"
cols = id, names_to = "screen_type", values_to = "time"
cols = -id, names_to = "screen_type", values_to = "time"

Work on several columns
data_food
#> # A tibble: 6 × 6
#> age occasion_type occasion_location food_type food amount
#> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 26 Aperitif before lunch Home water tap water 148.
#> 2 43 Lunch Home vegetable and… fruit jui… 500
#> 3 56 Lunch Home animal fat n.s. cook… 3.22
#> 4 49 Lunch Home potatoes and … potato fr… 107
#> 5 53 Lunch Home Meet based di… chicken n… 161
#> 6 36 Lunch Home plant-based f… n.s. fat 12.8
age
occasion_type
occasion_location
food_type
food
amount

mutate() variants
mutate_all()
mutate_at()
mutate_if()

mutate() variants
mutate_if()
your_dataframe %>%
mutate_if(
condition,
function_to_apply
)

mutate() variants
mutate_if()
data_food %>%
mutate_if(
is.numeric,
as.character
)
#> # A tibble: 6 × 6
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 26 Aperitif before lunch Home water tap water 147.5
age amount numeric

character

mutate() variants
mutate_if()
data_food %>%
mutate_if(
is.numeric,
list("chr" = as.character)
)
#> # A tibble: 6 × 8
#> age occasion_type occasion_locati… food_type food amount age_chr amount_chr
#> <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 26 Aperitif bef… Home water tap … 148. 26 147.5
#> 2 43 Lunch Home vegetabl… frui… 500 43 500
#> 3 56 Lunch Home animal f… n.s.… 3.22 56 3.22
#> 4 49 Lunch Home potatoes… pota… 107 49 107
#> 5 53 Lunch Home Meet bas… chic… 161 53 161
#> 6 36 Lunch Home plant-ba… n.s.… 12.8 36 12.84

mutate() variants
mutate_at()
your_dataframe %>%
mutate_at(
variables_to_transformed,
functions_to_apply
)

mutate() variants
mutate_at()
data_food %>%
mutate_at(
c("occasion_type", "amount"),
as.factor
)
#> # A tibble: 6 × 6
#> <dbl> <fct> <chr> <chr> <chr> <fct>

Specify the name of the variables to
transform
" "
data_food %>%
mutate_at(
c("occasion_type", "amount"),
as.factor
)
#> # A tibble: 6 × 6

transform
vars()
data_food %>%
mutate_at(
vars(occasion_type, amount),
as.factor
)
#> # A tibble: 6 × 6

transform
vars()
data_food %>%
mutate_at(
vars(ends_with("food")),
as.factor
)
#> # A tibble: 6 × 6
#> <dbl> <chr> <chr> <chr> <fct> <dbl>
#> 1 26 Aperitif before lunch Home water tap water 148.

summarise() variants
summarise_all()
summarise_at()
summarise_if()

summarise_at()
data_food %>%
summarise_at(
vars(age, amount),
mean
)
#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 NA
amount NA
na.rm = TRUE

Specify an adhoc function
data_food %>%
summarise_at(
vars(age, amount),
~ mean(.x, na.rm = TRUE)
)
#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 118.
~ function(.x)

summarise_at()
data_food %>%
summarise_at(
vars(age, amount),
list(
"var" = ~ var(.x, na.rm = TRUE),
"median" = ~ median(.x, na.rm = TRUE)
)
)
#> # A tibble: 1 × 4
#> age_var amount_var age_median amount_median
#> <dbl> <dbl> <dbl> <dbl>
#> 1 169. 17515. 37 79.3

summarise_if()
data_food %>%
summarise_if(
is.numeric,
~ mean(.x, na.rm = TRUE)
)
#> # A tibble: 1 × 2
#> age amount
#> <dbl> <dbl>
#> 1 37.5 118.

summarise_if()
data_food %>%
summarise_if(
is.numeric,
list(
"mean" = ~ mean(.x, na.rm = TRUE),
"var" = ~ var(.x, na.rm = TRUE),
"max" = ~ max(.x, na.rm = TRUE))
)
#> # A tibble: 1 × 6
#> age_mean amount_mean age_var amount_var age_max amount_max
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 37.5 118. 169. 17515. 60 1768.

Example
data_food %>%
mutate(
age_class = cut(age, breaks = 3, labels = c("jeune", "adulte", "senior") )
) %>%
group_by(age_class) %>%
summarise_if(
is.numeric,
list("min" = ~ min(.x, na.rm = TRUE),
"max" = ~ max(.x, na.rm = TRUE),
"mean" = ~ mean(.x, na.rm = TRUE)
)
)

To go further
select_if()
select_at()
groups()
group_by_all()
group_by_at()
group_by_if()
group_split()
group_nest()

On the origin of joins
#> # A tibble: 5,855 × 4 #> # A tibble: 4,725 × 2

#> NOMEN NOIND age gender #> NOIND activity_profile
#> <dbl> <dbl> <chr> <chr> #> <dbl> <chr>
#> 1 1101001 110100101 18-44 years Male #> 1 120100401 inactive and not sedentary
#> 2 1101007 110100701 45-64 years Female #> 2 120100501 inactive and not sedentary
#> 5 1101014 110101401 65-79 years Female #> 5 120100901 inactive and sedentary
#> 6 1101016 110101601 45-64 years Female #> 6 120101001 inactive and not sedentary
#> 8 1102001 110200101 45-64 years Male #> 8 120101301 inactive and sedentary
#> 9 1103003 110300301 45-64 years Male #> 9 120200401 inactive and sedentary
#> 10 1103005 110300501 65-79 years Male #> 10 120300101 active and sedentary
What's the join? Formation R - https://thinkr.fr 374 / 470

On the origin of joins
#> # A tibble: 4,725 × 2

#> age activity_profile
#> <chr> <chr>
#> 1 18-44 years inactive and sedentary
#> 2 45-64 years active and sedentary
#> 3 45-64 years active and not sedentary
#> 6 45-64 years <NA>

How a join works
item

Quiz
data_age data_phys_act
#> # A tibble: 5 × 4 #> # A tibble: 5 × 2

#> NOMEN NOIND tage_PS age #> NOIND activity_profile
#> <dbl> <chr> <chr> <chr> #> <chr> <chr>
#> 1 1131047 064 45-64 ans 45-64 years #> 1 059 inactive and not sedentary
#> 2 1132001 059 45-64 ans 45-64 years #> 2 098 inactive and sedentary
#> 3 1132006 035 18-44 ans 18-44 years #> 3 022 inactive and sedentary
#> 4 1132012 049 18-44 ans 18-44 years #> 4 049 active and sedentary
#> 5 1132014 022 65-79 ans 65-79 years #> 5 054 active and not sedentary
age
activity_profile
NOMEN
NOIND

The different ways to combine two tables
item
The different kinds of join Formation R - https://thinkr.fr 379 / 470

INNER JOIN

INNER JOIN

LEFT JOIN

LEFT JOIN

FULL JOIN

FULL JOIN

ANTI JOIN

ANTI JOIN

Quiz

Combine tables from the INCA3 study
data_a <- tibble( data_b <- tibble(
NOIND = c("087", "049", "054", "078", "064", NOIND = c("087", "078", "016", "013", "029",
"016"), "044"),
gender = c("Male", "Female", "Male", "Male", reads_nutri_label = c("Never", "Never",
"Female", "Female") "Never", "Never", "Sometimes", "Always")
) )
data_a data_b
#> # A tibble: 6 × 2 #> # A tibble: 6 × 2

#> NOIND gender #> NOIND reads_nutri_label
#> <chr> <chr> #> <chr> <chr>
#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 5 064 Female #> 5 029 Sometimes
#> 6 016 Female #> 6 044 Always
NOIND
Perform a join with {dplyr} Formation R - https://thinkr.fr 390 / 470

Specify the join key
{dplyr} by
by
by

INNER JOIN - inner_join()
data_a data_b
#> # A tibble: 6 × 2 #> # A tibble: 6 × 2

#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 6 016 Female #> 6 044 Always
data_a %>%
inner_join(data_b, by = "NOIND")
#> # A tibble: 3 × 3
#> NOIND gender reads_nutri_label
#> <chr> <chr> <chr>
#> 1 087 Male Never
#> 2 078 Male Never
#> 3 016 Female Never

LEFT JOIN - left_join()
data_a data_b
#> # A tibble: 6 × 2 #> # A tibble: 6 × 2

#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 6 016 Female #> 6 044 Always
data_a %>%
left_join(data_b, by = "NOIND")
#> # A tibble: 6 × 3
#> 1 087 Male Never
#> 2 049 Female <NA>
#> 3 054 Male <NA>
#> 4 078 Male Never

FULL JOIN - full_join()
data_a data_b
#> # A tibble: 6 × 2 #> # A tibble: 6 × 2

#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 6 016 Female #> 6 044 Always
data_a %>%
full_join(data_b, by = "NOIND")
#> # A tibble: 9 × 3
#> 1 087 Male Never
#> 3 054 Male <NA>
#> 4 078 Male Never
#> 7 013 <NA> Never
#> 8 029 <NA> Sometimes
#> 9 044 <NA> Always

ANTI JOIN - anti_join()
data_a data_b
#> # A tibble: 6 × 2 #> # A tibble: 6 × 2

#> 1 087 Male #> 1 087 Never
#> 2 049 Female #> 2 078 Never
#> 3 054 Male #> 3 016 Never
#> 4 078 Male #> 4 013 Never
#> 6 016 Female #> 6 044 Always
data_a %>%
anti_join(data_b, by = "NOIND")
#> # A tibble: 3 × 2
#> NOIND gender
#> <chr> <chr>
#> 1 049 Female
#> 2 054 Male
#> 3 064 Female

Quiz
data_age_act_phys
data_age data_phys_act
#> # A tibble: 5 × 2 #> # A tibble: 5 × 2

#> NOIND age #> NOIND activity_profile
#> 1 064 45-64 years #> 1 059 inactive and not sedentary
#> 2 059 45-64 years #> 2 098 inactive and sedentary
#> 3 035 18-44 years #> 3 022 inactive and sedentary
#> 4 049 18-44 years #> 4 049 active and sedentary
#> 5 022 65-79 years #> 5 054 active and not sedentary
data_age_act_phys
#> # A tibble: 5 × 3 data_age %>% full_join(data_phys_act, by = "NOIND")

#> NOIND activity_profile age
#> 1 059 inactive and not sedentary 45-64 years data_age %>% left_join(data_phys_act, by = "NOIND")
#> 2 098 inactive and sedentary <NA>
#> 3 022 inactive and sedentary 65-79 years
#> 4 049 active and sedentary 18-44 years data_phys_act %>% left_join(data_age, by = "age")
#> 5 054 active and not sedentary <NA>
data_phys_act %>% left_join(data_age, by = "NOIND")

These datasets are not in "tidy" format
#> # A tibble: 10 × 4
#> individual detail weight height
#> <int> <chr> <int> <int>
#> 1 1 60-M 96 166
#> 2 2 42-M 96 157
#> 3 3 32-I 96 161
#> 4 4 26-M 90 157
#> 5 5 56-F 86 170
#> 6 1 59-I 95 166
#> 7 2 38-M 85 171
#> 8 3 48-F 97 180
#> 9 4 24-M 88 155
#> 10 5 31-M 85 161
Clean your data with {tidyr} Formation R - https://thinkr.fr 398 / 470

These datasets are not in "tidy" format
#> # A tibble: 10 × 5
#> indiv year month day obs
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 2019 01 01 j
#> 2 2 2019 01 01 u
#> 3 3 2019 01 01 q
#> 4 4 2019 01 01 k
#> 5 5 2019 01 01 g
#> 6 1 2019 02 01 n
#> 7 2 2019 02 01 h
#> 8 3 2019 02 01 w
#> 9 4 2019 02 01 i
#> 10 5 2019 02 01 p

{tidyr}
{tidyr}
separate()
unite()

separate()
col into
data %>%
separate(
col = colonne_a,
into = c("a", "b"),
sep = "-" # make separator explicit
)

separate()
remove = FALSE
data %>%
separate(
col = column_ab,
into = c("a", "b"),
sep = "-",
remove = FALSE
)

separate()
# data_indiv %>%
# separate(
# col = detail,
# sep = "-",
data_indiv into = c("age", "gender")
) %>%
#> # A tibble: 10 × 4
mutate(age = as.numeric(age))
#> individual detail weight height
#> <int> <chr> <int> <int> #> # A tibble: 10 × 5
#> 1 1 60-M 96 166 #> individual age gender weight height
#> 2 2 42-M 96 157 #> <int> <dbl> <chr> <int> <int>
#> 3 3 32-I 96 161 #> 1 1 60 M 96 166
#> 4 4 26-M 90 157 #> 2 2 42 M 96 157
#> 5 5 56-F 86 170 #> 3 3 32 I 96 161
#> 6 1 59-I 95 166 #> 4 4 26 M 90 157
#> 7 2 38-M 85 171 #> 5 5 56 F 86 170
#> 8 3 48-F 97 180 #> 6 1 59 I 95 166
#> 9 4 24-M 88 155 #> 7 2 38 M 85 171
#> 10 5 31-M 85 161 #> 8 3 48 F 97 180
#> 9 4 24 M 88 155
#> 10 5 31 M 85 161
separate()

unite()
col
data %>%
unite(
col = "colonne_ab",
colonne_a, colonne_b,
sep = "/"
)

unite()
remove = FALSE
data %>%
unite(
col = "colonne_ab",
colonne_a, colonne_b,
sep = "/",
remove = FALSE
)

unite()
# data_obs %>%
# unite(
# col = "date",
# year, month, day,
# sep = "/"
data_obs )
#> # A tibble: 10 × 5 #> # A tibble: 10 × 3

#> indiv year month day obs #> indiv date obs
#> <int> <chr> <chr> <chr> <chr> #> <int> <chr> <chr>
#> 1 1 2019 01 01 j #> 1 1 2019/01/01 j
#> 2 2 2019 01 01 u #> 2 2 2019/01/01 u
#> 3 3 2019 01 01 q #> 3 3 2019/01/01 q
#> 4 4 2019 01 01 k #> 4 4 2019/01/01 k
#> 5 5 2019 01 01 g #> 5 5 2019/01/01 g
#> 6 1 2019 02 01 n #> 6 1 2019/02/01 n
#> 7 2 2019 02 01 h #> 7 2 2019/02/01 h
#> 8 3 2019 02 01 w #> 8 3 2019/02/01 w
#> 9 4 2019 02 01 i #> 9 4 2019/02/01 i
#> 10 5 2019 02 01 p #> 10 5 2019/02/01 p

Quiz
#> # A tibble: 8 × 4
#> id height_weight unite_weight unite_height
#> <int> <chr> <chr> <chr>
#> 1 1 187_83 kg cm
#> 2 2 166_69 kg cm
#> 3 3 175_86 kg cm
#> 4 4 164_70 kg cm
#> 5 5 183_81 kg cm
#> 6 6 177_88 kg cm
#> 7 7 160_68 kg cm
#> 8 8 179_79 kg cm
separate(col = height_weight, into = c("height", "weight"), sep = "-")

mutate(imc = as.numeric(weight) / ((as.numeric(height) / 100) ^ 2))
separate(col = height_weight, into = c("height", "weight"), sep = "_")

unite(col = "height_weight", height, weight, sep = "_")


Some observations are missing
NA
#> # A tibble: 8 × 4
#> year individual weight height
#> <chr> <int> <int> <int>
#> 1 2019 1 NA 180
#> 2 <NA> 2 96 187
#> 3 <NA> 3 95 184
#> 4 <NA> 4 89 189
#> 5 2020 1 85 182
#> 6 <NA> 2 85 176
#> 7 <NA> 3 86 180
#> 8 <NA> 4 NA 170
Handle missing data with {tidyr} Formation R - https://thinkr.fr 409 / 470

Some observations are missing
NA
#> # A tibble: 4 × 3
#> year month weight
#> 1 2017 01 86
#> 2 2018 02 95
#> 3 2019 01 90
#> 4 2019 02 92

Possible causes

Two strategies

{tidyr}
{tidyr}
fill()
drop_na()
replace_na()
complete()

fill()
data %>%
fill(column_a, column_b)
data %>%
fill(column_a, column_b, .direction = "up")

fill()
data_indiv %>%
# original dataset fill(year)
data_indiv
#> # A tibble: 8 × 4
#> # A tibble: 8 × 4 #> year individual weight height
#> year individual weight height #> <chr> <int> <int> <int>
#> <chr> <int> <int> <int> #> 1 2019 1 NA 180
#> 1 2019 1 NA 180 #> 2 2019 2 96 187
#> 2 <NA> 2 96 187 #> 3 2019 3 95 184
#> 3 <NA> 3 95 184 #> 4 2019 4 89 189
#> 4 <NA> 4 89 189 #> 5 2020 1 85 182
#> 5 2020 1 85 182 #> 6 2020 2 85 176
#> 6 <NA> 2 85 176 #> 7 2020 3 86 180
#> 7 <NA> 3 86 180 #> 8 2020 4 NA 170
#> 8 <NA> 4 NA 170

fill()
data_indiv %>%
# original dataset fill(year, .direction = "up")
data_indiv
#> # A tibble: 8 × 4
#> # A tibble: 8 × 4 #> year individual weight height
#> year individual weight height #> <chr> <int> <int> <int>
#> <chr> <int> <int> <int> #> 1 2019 1 NA 180
#> 1 2019 1 NA 180 #> 2 2020 2 96 187
#> 2 <NA> 2 96 187 #> 3 2020 3 95 184
#> 3 <NA> 3 95 184 #> 4 2020 4 89 189
#> 4 <NA> 4 89 189 #> 5 2020 1 85 182
#> 5 2020 1 85 182 #> 6 <NA> 2 85 176
#> 6 <NA> 2 85 176 #> 7 <NA> 3 86 180
#> 7 <NA> 3 86 180 #> 8 <NA> 4 NA 170
#> 8 <NA> 4 NA 170

drop_na()
data %>%
drop_na()
data %>%
drop_na(column_a, column_b)

drop_na()
# original dataset data_indiv %>%

data_indiv drop_na(year)
#> # A tibble: 8 × 4 #> # A tibble: 2 × 4

#> year individual weight height #> year individual weight height
#> <chr> <int> <int> <int> #> <chr> <int> <int> <int>
#> 1 2019 1 NA 180 #> 1 2019 1 NA 180
#> 2 <NA> 2 96 187 #> 2 2020 1 85 182
#> 3 <NA> 3 95 184
#> 4 <NA> 4 89 189
#> 5 2020 1 85 182
#> 6 <NA> 2 85 176
#> 7 <NA> 3 86 180
#> 8 <NA> 4 NA 170

replace_na()
column = "value" NA column "value"
data %>%
replace_na(
replace = list(
column_a = "value_a",
column_b = "value_b"
)
)

replace_na()
# data_indiv %>%
# replace_na(
# replace = list(year = "00")
data_indiv )
#> # A tibble: 8 × 4 #> # A tibble: 8 × 4

#> year individual weight height #> year individual weight height
#> <chr> <int> <int> <int> #> <chr> <int> <int> <int>
#> 1 2019 1 NA 180 #> 1 2019 1 NA 180
#> 2 <NA> 2 96 187 #> 2 00 2 96 187
#> 3 <NA> 3 95 184 #> 3 00 3 95 184
#> 4 <NA> 4 89 189 #> 4 00 4 89 189
#> 5 2020 1 85 182 #> 5 2020 1 85 182
#> 6 <NA> 2 85 176 #> 6 00 2 85 176
#> 7 <NA> 3 86 180 #> 7 00 3 86 180
#> 8 <NA> 4 NA 170 #> 8 00 4 NA 170

complete()
NA
data %>%
complete(column_a, column_b)
column_a column_b
NA

complete()
# original dataset data_weight %>%

data_weight complete(year, month)
#> # A tibble: 4 × 3 #> # A tibble: 6 × 3

#> year month weight #> year month weight
#> <chr> <chr> <dbl> #> <chr> <chr> <dbl>
#> 1 2017 01 86 #> 1 2017 01 86
#> 2 2018 02 95 #> 2 2017 02 NA
#> 3 2019 01 90 #> 3 2018 01 NA
#> 4 2019 02 92 #> 4 2018 02 95
#> 5 2019 01 90
#> 6 2019 02 92

Noteworthy
drop_na() fill() tidyselect
data %>%
fill(
contains("encoded")
)
data %>%
drop_na(
starts_with("comment")
)

Quiz
data
data %>% filter_na()
data %>% fill()
data %>% complete()
data %>% drop_na()

Manipulate dates and times with
{lubridate}
{lubridate}
library(tidyverse)
library(lubridate)

About the ISO 8601 format

Now
now() today()
now()
#> [1] "2022-01-13 11:27:15 UTC"
today()
#> [1] "2022-01-13"
today() Date now()

POSIXt
today() %>% class()
#> [1] "Date"
now() %>% class()
#> [1] "POSIXct" "POSIXt"

Quiz

Import dates

Transformation from string to Date
the_dates
the_dates <- c("01-01-09", "010209", "01-03-09","01-01-2009", "01-02-2009",

"01/03/2009")
dmy() ydm() mdy()
mdy()
month - day - year
dmy(the_dates)
#> [1] "2009-01-01" "2009-02-01" "2009-03-01" "2009-01-01" "2009-02-01"

#> [6] "2009-03-01"
mdy(the_dates)
#> [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-01" "2009-01-02"

#> [6] "2009-01-03"

Quiz
"14-02-1986"

Transformation from string to Date-hour
(POSIXt)
ymd_hms()
dmy_hm()

Questions
"1986/02/15 20h05"
ymd_hm("1986/02/15 20h05")
#> [1] "1986-02-15 20:05:00 UTC"
"the 11th of november 1918 at 11:00AM"
dmy_hm("the 11th of november 1918 at 11:00AM")
#> [1] "1918-11-11 11:00:00 UTC"
tz= OlsonNames()
dmy_hm("the 11th of november 1918 at 11:00AM", tz = "Europe/Paris")
#> [1] "1918-11-11 11:00:00 WET"

Questions
"le 11 novembre 1918 à 11 heures 00"
dmy_hm("le 11 novembre 1918 a 11 heures 00")
#> Warning: All formats failed to parse. No formats found.
#> [1] NA

About the "locale"
Sys.getlocale("LC_TIME") # American English
#> [1] "en_US.UTF-8"
Sys.setlocale("LC_TIME", "fr_FR.UTF-8") # French
#> [1] "fr_FR.UTF-8"
#> [1] "1918-11-11 11:00:00 UTC"

Extract information from a date -
{lubridate}
present_moment <- now()

present_moment
#> [1] "2022-01-13 11:27:15 UTC"
year(present_moment) hour(present_moment)
#> [1] 2022 #> [1] 11
month(present_moment) minute(present_moment)
#> [1] 1 #> [1] 27
day(present_moment) second(present_moment)
#> [1] 13 #> [1] 15.8971
wday(present_moment)
#> [1] 5

Exercise
tribble(
~name, ~date_of_birth,
"Sébastien", "26 juillet 83",
"Diane", "1er janvier 1985",
"Vincent", "11/02/1986",
"Colin", "22111988",
"Margot", "17 septembre 1991",
"Cervan", "22-octobre-91"
) %>%
mutate(date_of_birth = .....(date_of_birth)
) %>%
filter(.....(date_of_birth) == 9)

Exercise
ymd("1986/11/02") %>%
wday(label = TRUE, abbr = FALSE)
#> [1] Sunday

#> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Notion of period
years(2) # a period of 2 years minutes(2) # a period of 2 minutes
#> [1] "2y 0m 0d 0H 0M 0S" #> [1] "2M 0S"
months(2) # a period of 2 months hours(2) # a period of 2 hours
#> [1] "2m 0d 0H 0M 0S" #> [1] "2H 0M 0S"
days(2) # a period of 2 days seconds(2) # a period of 2 seconds
#> [1] "2d 0H 0M 0S" #> [1] "2S"
years(2) + months(2) + hours(2)
#> [1] "2y 2m 0d 2H 0M 0S"
today() + days(3)
#> [1] "2022-01-16"

Quiz
tomorrow(tomorrow())
today() + days(2)
today() + hours(48)
today() + day(2)

Practice

{stringr} package

Let's work on an example
phonebook <- tibble(

id = 1:5,
firstname = c("Steven", "ERIN", "Marie-Louise", "Layla", "Mitchell"),
lastname = c("DIXON", "FLORES ", "guillaumin ", "BRYANT", " Berry"),
coord = c(
" 6804 Preston Rd (563)-300-2113", "flores@mail.com (617)-990-5931 ",
"9464 Preston Rd, Dallas, TX 75225", "8046 Forest Ln, Humble, TX 77338", "
(089).225.6911 berry@msn.com"
)
)

Concatenate
str_c()
sep collapse
str_c("one", "two", sep = " ") str_c(c("one", "two"), collapse = " ")
#> [1] "one two" #> [1] "one two"
lastname_firstname
phonebook %>%
mutate(lastname_firstname = str_c(lastname, firstname, sep = "_")) %>%
select(-coord)
#> # A tibble: 5 × 4
#> id firstname lastname lastname_firstname
#> 1 1 Steven "DIXON" "DIXON_Steven"
#> 2 2 ERIN "FLORES " "FLORES _ERIN"
#> 3 3 Marie-Louise "guillaumin " "guillaumin _Marie-Louise"
#> 4 4 Layla "BRYANT" "BRYANT_Layla"
#> 5 5 Mitchell " Berry" " Berry_Mitchell"

Prune (remove leading and trailing
spaces)
str_trim()
str_trim(" Hello ")
#> [1] "Hello"
phonebook %>%
mutate_all(str_trim) %>%
mutate(lastname_firstname = str_c(lastname, firstname, sep = "_")) %>%
select(-coord)
#> # A tibble: 5 × 4
#> <chr> <chr> <chr> <chr>
#> 1 1 Steven DIXON DIXON_Steven
#> 2 2 ERIN FLORES FLORES_ERIN
#> 3 3 Marie-Louise guillaumin guillaumin_Marie-Louise
#> 4 4 Layla BRYANT BRYANT_Layla
#> 5 5 Mitchell Berry Berry_Mitchell

Change case (upper/lower case)
str_to_upper() str_to_lower() str_to_title() str_to_sentence()
str_to_upper("hello fred") str_to_title("hello fred")
#> [1] "HELLO FRED" #> [1] "Hello Fred"
phonebook %>%
mutate_all(str_trim) %>%
mutate(
firstname = str_to_title(firstname),
lastname = str_to_upper(lastname),
lastname_firstname = str_c(lastname, firstname, sep = "_")
) %>%
select(-coord)
#> # A tibble: 5 × 4
#> <chr> <chr> <chr> <chr>
#> 1 1 Steven DIXON DIXON_Steven
#> 2 2 Erin FLORES FLORES_Erin
#> 3 3 Marie-Louise GUILLAUMIN GUILLAUMIN_Marie-Louise
#> 4 4 Layla BRYANT BRYANT_Layla
#> 5 5 Mitchell BERRY BERRY_Mitchell

Question
c(" HELLO", " everyone ") %>%

str_...() %>%
str_c(... = " ") %>%
str_to_...()
#> [1] "Hello everyone"

Detect a pattern
str_detect()
c("William", "Carl", "Jean-Paul", "Paul") %>%

str_detect("Paul")
#> [1] FALSE FALSE TRUE TRUE
filter()
phonebook %>%
filter(str_detect(coord, "Dallas")) %>%
select(-firstname, -lastname)
#> # A tibble: 1 × 2
#> id coord
#> <int> <chr>
#> 1 3 9464 Preston Rd, Dallas, TX 75225

Replace/delete
str_replace_all() str_remove_all()
c("William", "Carl", "Jean-Paul", "Paul") %>%

str_replace_all(pattern = "Paul", replacement = "Jack")
#> [1] "William" "Carl" "Jean-Jack" "Jack"
phonebook %>%
mutate(
coord_new =
str_replace_all(coord, pattern = "msn.com", replacement = "hotmail.com")
) %>%
select(starts_with("coord"))
#> # A tibble: 5 × 2
#> coord coord_new
#> <chr> <chr>
#> 1 " 6804 Preston Rd (563)-300-2113" " 6804 Preston Rd (563)-300-2113"
#> 2 "flores@mail.com (617)-990-5931 " "flores@mail.com (617)-990-5931 "
#> 3 "9464 Preston Rd, Dallas, TX 75225" "9464 Preston Rd, Dallas, TX 75225"
#> 4 "8046 Forest Ln, Humble, TX 77338" "8046 Forest Ln, Humble, TX 77338"
#> 5 "(089).225.6911 berry@msn.com" "(089).225.6911 berry@hotmail.com"

Use regular expressions



Use Regular expressions
$ [:digit:]
^ [:upper:]
. [:punct:]
"01.53.40.30.20" %>% str_remove_all(pattern = "[:punct:]")
#> [1] "0153403020"
phonebook %>%
filter(str_detect(firstname, "^M")) %>%
select(-coord)
#> # A tibble: 2 × 3
#> id firstname lastname
#> <int> <chr> <chr>
#> 1 3 Marie-Louise "guillaumin "
#> 2 5 Mitchell " Berry"

Extract
str_extract()
# succession of letters at the end of the # succession of numbers

sentence c("93300 Aubervilliers", "Paris 75017")
"R is very powerful" %>% str_extract(" %>% str_extract("[:digit:]+")
[:alpha:]+$")
#> [1] "93300" "75017"
#> [1] "powerful"
phonebook %>%
mutate(email = coord %>% str_extract("[:alnum:]+@[:alnum:]+\\.[:alnum:]+")) %>%
select(id, email) %>%
filter(!is.na(email))
#> # A tibble: 2 × 2
#> id email
#> <int> <chr>
#> 1 2 flores@mail.com
#> 2 5 berry@msn.com

Quizz
phonebook %>%
mutate(
... = str_extract(coord, "[:alnum:]+@[:alnum:]+\\.[:alnum:]+"),
... = str_extract(coord, "([:digit:]|[:punct:]){10,14}+"),
address = str_...(coord),
address = case_when(
is.na(email) & is.na(phone) ~ address,
is.na(email) ~ ...(address, fixed(phone)),
is.na(phone) ~ str_remove_all(address, ...),
TRUE ~ address %>%
str_remove_all(...) %>%
str_remove_all(fixed(phone))
),
telephone = str_remove_all(phone, ...)
) %>%
select(id, email, phone, address)

Expected result
#> # A tibble: 5 × 4
#> id firstname lastname coord
#> 1 1 Steven "DIXON" " 6804 Preston Rd (563)-300-2113"
#> 2 2 ERIN "FLORES " "flores@mail.com (617)-990-5931 "
#> 3 3 Marie-Louise "guillaumin " "9464 Preston Rd, Dallas, TX 75225"
#> 4 4 Layla "BRYANT" "8046 Forest Ln, Humble, TX 77338"
#> 5 5 Mitchell " Berry" "(089).225.6911 berry@msn.com"
#> # A tibble: 5 × 4
#> id email phone address
#> 1 1 <NA> 5633002113 "6804 Preston Rd "
#> 2 2 flores@mail.com 6179905931 " "
#> 3 3 <NA> <NA> "9464 Preston Rd, Dallas, TX 75225"
#> 4 4 <NA> <NA> "8046 Forest Ln, Humble, TX 77338"
#> 5 5 berry@msn.com 0892256911 " "

Practical

Create sentences
glue()
i <- 1
glue::glue("the value of i is {i} it's little")
#> the value of i is 1 it's little
i <- 1
stringr::str_c("the value of i is ", i, " it's little")
#> [1] "the value of i is 1 it's little"

Create sentences
glue()
x <- 1:4
y <- c("little", "not much", "not bad", "a lot")
glue::glue("the value of i is {x} it's {y}")
#> the value of i is 1 it's little

#> the value of i is 2 it's not much
#> the value of i is 3 it's not bad
#> the value of i is 4 it's a lot

Create sentences
glue()
firstname <- "Teddy"

weight <- 131
height <- 2.04
glue::glue("the BMI of {firstname} is {BMI}",

BMI = round(weight / (height)**2, digits = 1)
)
#> the BMI of Teddy is 31.5
stringr::str_c("the BMI of ", firstname, " is ", round(weight / (height)**2, digits =

1))
#> [1] "the BMI of Teddy is 31.5"

Combine in a data frame
people <- tibble::tribble(
~firstname, ~weight, ~height,
"Teddy", 131, 2.04,
"Tom", 0.1, 0.5,
"Carla", 75, 1.75
)
people %>%
mutate(bmi = round(weight / (height)**2, digits = 2),
text = glue::glue("the BMI of {firstname} is {bmi}")
)
#> # A tibble: 3 × 5
#> firstname weight height bmi text
#> <chr> <dbl> <dbl> <dbl> <glue>
#> 1 Teddy 131 2.04 31.5 the BMI of Teddy is 31.48
#> 2 Tom 0.1 0.5 0.4 the BMI of Tom is 0.4
#> 3 Carla 75 1.75 24.5 the BMI of Carla is 24.49

Assess training quality
So ? What did you think about it ? Formation R - https://thinkr.fr 466 / 470
Satisfaction
So ? What did you think about it ? Formation R - https://thinkr.fr 467 / 470

Training Review
Ressources
Training Review - Level 1 Formation R - https://thinkr.fr 469 / 470

R Basics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Basics

Uploaded by

Copyright:

Available Formats

​

​Getting to know each other Formation R - https://thinkr.fr 5 / 470

​It is gonna be dense... Formation R - https://thinkr.fr 9 / 470

​It is gonna be dense... Formation R - https://thinkr.fr 10 / 470

​It is gonna be dense... Formation R - https://thinkr.fr 11 / 470

​It is gonna be dense... Formation R - https://thinkr.fr 12 / 470

​Timing Formation R - https://thinkr.fr 14 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 16 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 17 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 18 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 19 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 20 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 21 / 470

​How to interact with visioconference tools ? Formation R - https://thinkr.fr 22 / 470

​What is Bakacode? Formation R - https://thinkr.fr 24 / 470

​What is Bakacode? Formation R - https://thinkr.fr 25 / 470

​What is Bakacode? Formation R - https://thinkr.fr 26 / 470

​What is Bakacode? Formation R - https://thinkr.fr 27 / 470

​What is Bakacode? Formation R - https://thinkr.fr 28 / 470

​What is Bakacode? Formation R - https://thinkr.fr 29 / 470

​What is Bakacode? Formation R - https://thinkr.fr 30 / 470

​What is Bakacode? Formation R - https://thinkr.fr 31 / 470

​What is Bakacode? Formation R - https://thinkr.fr 32 / 470

​What is Bakacode? Formation R - https://thinkr.fr 33 / 470

​What is Bakacode? Formation R - https://thinkr.fr 34 / 470

​What is Bakacode? Formation R - https://thinkr.fr 35 / 470

​What is Bakacode? Formation R - https://thinkr.fr 36 / 470

​What is Bakacode? Formation R - https://thinkr.fr 37 / 470

​What is Bakacode? Formation R - https://thinkr.fr 38 / 470

​Traning content Formation R - https://thinkr.fr 40 / 470

​Welcome to R Formation R - https://thinkr.fr 42 / 470

​Welcome to R Formation R - https://thinkr.fr 43 / 470

​Welcome to R Formation R - https://thinkr.fr 44 / 470

​Welcome to R Formation R - https://thinkr.fr 45 / 470

​Welcome to R Formation R - https://thinkr.fr 46 / 470

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 48 / 470

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 49 / 470

​Understand and initialize a Rstudio project Formation R - https://thinkr.fr 50 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 52 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 53 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 54 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 55 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 56 / 470

​Naviguate in Rstudio Formation R - https://thinkr.fr 57 / 470

​The console Formation R - https://thinkr.fr 59 / 470

​The console Formation R - https://thinkr.fr 60 / 470

​The console Formation R - https://thinkr.fr 61 / 470

​Functions Formation R - https://thinkr.fr 63 / 470

​Functions Formation R - https://thinkr.fr 64 / 470

​Customize R with packages Formation R - https://thinkr.fr 66 / 470

​Load packages Formation R - https://thinkr.fr 68 / 470

​Load packages Formation R - https://thinkr.fr 69 / 470

​Install packages from CRAN Formation R - https://thinkr.fr 71 / 470

​Install packages from CRAN Formation R - https://thinkr.fr 72 / 470

​Install packages from CRAN Formation R - https://thinkr.fr 73 / 470

​Install packages from CRAN Formation R - https://thinkr.fr 74 / 470

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 76 / 470

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 77 / 470

​The main shortcuts in Rstudio Formation R - https://thinkr.fr 78 / 470

​Data manipulation workflow Formation R - https://thinkr.fr 80 / 470

​Data manipulation workflow Formation R - https://thinkr.fr 81 / 470

​Data manipulation workflow Formation R - https://thinkr.fr 82 / 470

​Data manipulation workflow Formation R - https://thinkr.fr 83 / 470

Getting to know each other Formation R - https://thinkr.fr 5 / 470

It is gonna be dense... Formation R - https://thinkr.fr 9 / 470

It is gonna be dense... Formation R - https://thinkr.fr 10 / 470

It is gonna be dense... Formation R - https://thinkr.fr 11 / 470

It is gonna be dense... Formation R - https://thinkr.fr 12 / 470

Timing Formation R - https://thinkr.fr 14 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 16 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 17 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 18 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 19 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 20 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 21 / 470

How to interact with visioconference tools ? Formation R - https://thinkr.fr 22 / 470

What is Bakacode? Formation R - https://thinkr.fr 24 / 470

What is Bakacode? Formation R - https://thinkr.fr 25 / 470

What is Bakacode? Formation R - https://thinkr.fr 26 / 470

What is Bakacode? Formation R - https://thinkr.fr 27 / 470

What is Bakacode? Formation R - https://thinkr.fr 28 / 470

What is Bakacode? Formation R - https://thinkr.fr 29 / 470

What is Bakacode? Formation R - https://thinkr.fr 30 / 470

What is Bakacode? Formation R - https://thinkr.fr 31 / 470

What is Bakacode? Formation R - https://thinkr.fr 32 / 470

What is Bakacode? Formation R - https://thinkr.fr 33 / 470

What is Bakacode? Formation R - https://thinkr.fr 34 / 470

What is Bakacode? Formation R - https://thinkr.fr 35 / 470

What is Bakacode? Formation R - https://thinkr.fr 36 / 470

What is Bakacode? Formation R - https://thinkr.fr 37 / 470

What is Bakacode? Formation R - https://thinkr.fr 38 / 470

Traning content Formation R - https://thinkr.fr 40 / 470

Welcome to R Formation R - https://thinkr.fr 42 / 470

Welcome to R Formation R - https://thinkr.fr 43 / 470

Welcome to R Formation R - https://thinkr.fr 44 / 470

Welcome to R Formation R - https://thinkr.fr 45 / 470

Welcome to R Formation R - https://thinkr.fr 46 / 470

Understand and initialize a Rstudio project Formation R - https://thinkr.fr 48 / 470

Understand and initialize a Rstudio project Formation R - https://thinkr.fr 49 / 470

Understand and initialize a Rstudio project Formation R - https://thinkr.fr 50 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 52 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 53 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 54 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 55 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 56 / 470

Naviguate in Rstudio Formation R - https://thinkr.fr 57 / 470

The console Formation R - https://thinkr.fr 59 / 470

The console Formation R - https://thinkr.fr 60 / 470

The console Formation R - https://thinkr.fr 61 / 470

Functions Formation R - https://thinkr.fr 63 / 470

Functions Formation R - https://thinkr.fr 64 / 470

Customize R with packages Formation R - https://thinkr.fr 66 / 470

Load packages Formation R - https://thinkr.fr 68 / 470

Load packages Formation R - https://thinkr.fr 69 / 470

Install packages from CRAN Formation R - https://thinkr.fr 71 / 470

Install packages from CRAN Formation R - https://thinkr.fr 72 / 470

Install packages from CRAN Formation R - https://thinkr.fr 73 / 470

Install packages from CRAN Formation R - https://thinkr.fr 74 / 470

The main shortcuts in Rstudio Formation R - https://thinkr.fr 76 / 470

The main shortcuts in Rstudio Formation R - https://thinkr.fr 77 / 470

The main shortcuts in Rstudio Formation R - https://thinkr.fr 78 / 470

Data manipulation workflow Formation R - https://thinkr.fr 80 / 470

Data manipulation workflow Formation R - https://thinkr.fr 81 / 470

Data manipulation workflow Formation R - https://thinkr.fr 82 / 470

Data manipulation workflow Formation R - https://thinkr.fr 83 / 470

Data manipulation workflow Formation R - https://thinkr.fr 84 / 470

The life history of a plot Formation R - https://thinkr.fr 86 / 470

The life history of a plot Formation R - https://thinkr.fr 87 / 470

The life history of a plot Formation R - https://thinkr.fr 88 / 470

The life history of a plot Formation R - https://thinkr.fr 89 / 470

The life history of a plot Formation R - https://thinkr.fr 90 / 470

The life history of a plot Formation R - https://thinkr.fr 91 / 470

The life history of a plot Formation R - https://thinkr.fr 92 / 470

The life history of a plot Formation R - https://thinkr.fr 93 / 470

The life history of a plot Formation R - https://thinkr.fr 94 / 470

The life history of a plot Formation R - https://thinkr.fr 95 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 97 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 98 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 99 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 100 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 101 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 102 / 470

{ggplot2} ? What is it ? Formation R - https://thinkr.fr 103 / 470