Professional Documents
Culture Documents
Curso 2 Data Types III Dplyr
Curso 2 Data Types III Dplyr
Index
• Introduction to R • Data input and Output
• Rstudio • Graphs
• Getting Started - R Console • dplyr
• Help
• R-workspace
• Packages
• Data types and Structures
• Vectors
• Missing and special values
• Matrices and Arrays
• Factors
• Lists
• Data frames
• Indexing
• Conditional indexing
Introducción al lenguaje de programación R
What is tidyverse?
http://www.seec.uct.ac.za/r-tidyverse
Introducción al lenguaje de programación R
What is dplyr()?
• The dplyr is a powerful R-package to manipulate, clean
and summarize unstructured data.
• Makes data exploration and data manipulation easy and
fast in R.
install.packages("dplyr")
library(dplyr)
Introducción al lenguaje de programación R
What is dplyr()?
• The dplyr is a powerful R-package to manipulate, clean
and summarize unstructured data.
• Makes data exploration and data manipulation easy and
fast in R.
install.packages("tidyverse").
library(dplyr)
Introducción al lenguaje de programación R
What is dplyr()?
• dplyr is a grammar of data manipulation,
• provides a consistent set of verbs that help you solve the
most common data manipulation challenges:
filter()
• Select rows in a dataframe (df).
• Dataset starwars.
• Column species
> unique(starwars$species)
[1] "Human" "Droid" "Wookiee" "Rodian" "Hutt"
[6] "Yoda's species" "Trandoshan" "Mon Calamari" "Ewok" "Sullustan"
[11] "Neimodian" "Gungan" NA "Toydarian" "Dug"
[16] "Zabrak" "Twi'lek" "Vulptereen" "Xexto" "Toong"
[21] "Cerean" "Nautolan" "Tholothian" "Iktotchi" "Quermian"
[26] "Kel Dor" "Chagrian" "Geonosian" "Mirialan" "Clawdite"
[31] "Besalisk" "Kaminoan" "Aleena" "Skakoan" "Muun"
[36] "Togruta" "Kaleesh" "Pau’an”
filter()
#- Filtrado por cadenas
droids %>% filter(homeworld == “Naboo")
filter() ejercicios
• #- androides con una altura entre 96 y 200
filter()
#- filas con valores en un rango
droids %>% filter(height >= 96 , height < 200)
droids %>% filter(height >= 96 & height < 200)
select()
• Select columns in a dataframe (df).
select()
• Select helper functions
Helpers Description
starts_with() Starts with a prefix
ends_with() Ends with a prefix
contains() Contains a literal string
matches() Matches a regular expression
num_range() Numerical range like x01, x02, x03.
one_of() Variables in character vector.
everything() All variables.
Introducción al lenguaje de programación R
select()
select()
arrange()
• Reorder rows in a dataframe (df).
rename()
• Rename columns in a dataframe (df).
mutate()
• Create new columns in a dataframe (df).
sumarise()
• Allows to colapse/summarise rows in a dataframe (df).
sumarise()
sumarise()
starwars %>%
group_by(species) %>%
summarise(n = n(),
mass = mean(mass, na.rm = TRUE)
)
1 Aleena 1 15
2 Besalisk 1 102
3 Cerean 1 82
4 Chagrian 1 NaN
5 Clawdite 1 55
6 Droid 5 69.8
7 Dug 1 40
8 Ewok 1 20 …..
Introducción al lenguaje de programación R
sumarise()
starwars %>%
group_by(species) %>%
summarise(n = n(),
mass = mean(mass, na.rm = TRUE)
) %>%
filter(n > 2, mass > 50)
# A tibble: 3 x 3
species n mass
<chr> <int> <dbl>
1 Droid 5 69.8
2 Gungan 3 74
3 Human 35 82.8
Introducción al lenguaje de programación R
sumarize()
Todo junto
sumarize()
• Allows to calculate operations for different groups in a dataframe
(droids, humans...)
dplyr::n_distinct(x):
counts unique values in a vector;
similar a length(unique(x))
Summary
• R is an interactive statistical language
• Extremely flexible and powerful
• Data manipulation and coding
• Can be used as a calculator, simulator and sampler
• FREE!
Introducción al lenguaje de programación R 28
Useful Packages
Pre-modeling stage
Data visualization:
ggplot2, googleVis
Data Transformation:
plyr, dplyr, data.table
Gracias…