Tidy Verse

R For Data Science Cheat Sheet dplyr ggplot2
Tidyverse for Beginners Filter Scatter plot

Learn More R for Data Science Interactively at www.datacamp.com filter() allows you to select a subset of rows in a data frame. Scatter plots allow you to compare two variables within your data. To do this with
ggplot2, you use geom_point()
> iris %>% Select iris data of species
filter(Species=="virginica") "virginica" > iris_small <- iris %>%
> iris %>% Select iris data of species filter(Sepal.Length > 5)
Tidyverse filter(Species=="virginica", "virginica" and sepal length > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
Compare petal
width and length
Sepal.Length > 6) greater than 6.
The tidyverse is a powerful collection of R packages that are actually geom_point()
data tools for transforming and visualizing data. All packages of the
tidyverse share an underlying philosophy and common APIs. Arrange Additional Aesthetics
arrange() sorts the observations in a dataset in ascending or descending order • Color
The core packages are: based on one of its variables. > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width,
• ggplot2, which implements the grammar of graphics. You can use it > iris %>% Sort in ascending order of color=Species)) +
to visualize your data. arrange(Sepal.Length) sepal length geom_point()
> iris %>% Sort in descending order of
• dplyr is a grammar of data manipulation. You can use it to solve the arrange(desc(Sepal.Length)) sepal length • Size
most common data manipulation challenges. > ggplot(iris_small, aes(x=Petal.Length,
Combine multiple dplyr verbs in a row with the pipe operator %>%: y=Petal.Width,
color=Species,
• tidyr helps you to create tidy data or data where each variable is in a > iris %>% Filter for species "virginica"
size=Sepal.Length)) +
column, each observation is a row end each value is a cell. filter(Species=="virginica") %>% then arrange in descending
geom_point()
arrange(desc(Sepal.Length)) order of sepal length
• readr is a fast and friendly way to read rectangular data. Faceting
Mutate > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
• purrr enhances R’s functional programming (FP) toolkit by providing a mutate() allows you to update or create new columns of a data frame.
geom_point()+
complete and consistent set of tools for working with functions and facet_wrap(~Species)
vectors. > iris %>% Change Sepal.Length to be
mutate(Sepal.Length=Sepal.Length*10) in millimeters
• tibble is a modern re-imaginging of the data frame. > iris %>% Create a new column Line Plots
mutate(SLMm=Sepal.Length*10) called SLMm
> by_year <- gapminder %>%
Combine the verbs filter(), arrange(), and mutate(): group_by(year) %>%
• stringr provides a cohesive set of functions designed to make summarize(medianGdpPerCap=median(gdpPercap))
> iris %>%
working with strings as easy as posssible > ggplot(by_year, aes(x=year,
filter(Species=="Virginica") %>%
y=medianGdpPerCap))+
mutate(SLMm=Sepal.Length*10) %>% geom_line()+
• forcats provide a suite of useful tools that solve common problems arrange(desc(SLMm)) expand_limits(y=0)
with factors.
Summarize Bar Plots
You can install the complete tidyverse with:
> install.packages("tidyverse") summarize() allows you to turn many observations into a single data point.
> by_species <- iris %>%
> iris %>% Summarize to find the filter(Sepal.Length>6) %>%
Then, load the core tidyverse and make it available in your current R summarize(medianSL=median(Sepal.Length)) median sepal length group_by(Species) %>%
session by running: > iris %>% Filter for virginica then summarize(medianPL=median(Petal.Length))
> library(tidyverse) filter(Species=="virginica") %>% summarize the median > ggplot(by_species, aes(x=Species,
summarize(medianSL=median(Sepal.Length)) sepal length y=medianPL)) +
Note: there are many other tidyverse packages with more specialised usage. They are not geom_col()
loaded automatically with library(tidyverse), so you’ll need to load each one with its own call You can also summarize multiple variables at once:
to library().
> iris %>% Histograms
Useful Functions filter(Species=="virginica") %>%
summarize(medianSL=median(Sepal.Length), > ggplot(iris_small, aes(x=Petal.Length))+
> tidyverse_conflicts() Conflicts between tidyverse and other maxSL=max(Sepal.Length)) geom_histogram()
packages
> tidyverse_deps() List all tidyverse dependencies group_by() allows you to summarize within groups instead of summarizing the
> tidyverse_logo() Get tidyverse logo, using ASCII or unicode entire dataset:
characters
> tidyverse_packages() List all tidyverse packages
> iris %>% Find median and max Box Plots
group_by(Species) %>% sepal length of each
> tidyverse_update() Update tidyverse packages summarize(medianSL=median(Sepal.Length), species > ggplot(iris_small, aes(x=Species,
maxSL=max(Sepal.Length)) y=Sepal.Width))+
Loading in the data > iris %>%
filter(Sepal.Length>6) %>%
Find median and max
petal length of each
geom_boxplot()
> library(datasets) Load the datasets package group_by(Species) %>% species with sepal
> library(gapminder) Load the gapminder package summarize(medianPL=median(Petal.Length), length > 6
> attach(iris) Attach iris data to the R search path maxPL=max(Petal.Length)) DataCamp
Learn R for Data Science Interactively

Tidy Verse

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tidy Verse

Uploaded by

Copyright:

Available Formats

R For Data Science Cheat Sheet dplyr ggplot2

Tidyverse for Beginners Filter Scatter plot

You might also like