You are on page 1of 1

R For Data Science Cheat Sheet dplyr ggplot2

Tidyverse for Beginners Filter Scatter plot


Learn More R for Data Science Interactively at www.datacamp.com filter() allows you to select a subset of rows in a data frame. Scatter plots allow you to compare two variables within your data. To do this with
ggplot2, you use geom_point()
> iris %>% Select iris data of species
filter(Species=="virginica") "virginica" > iris_small <- iris %>%
> iris %>% Select iris data of species filter(Sepal.Length > 5)
Tidyverse filter(Species=="virginica", "virginica" and sepal length > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
Compare petal
width and length
Sepal.Length > 6) greater than 6.
The tidyverse is a powerful collection of R packages that are actually geom_point()
data tools for transforming and visualizing data. All packages of the
tidyverse share an underlying philosophy and common APIs. Arrange Additional Aesthetics
arrange() sorts the observations in a dataset in ascending or descending order • Color
The core packages are: based on one of its variables. > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width,
• ggplot2, which implements the grammar of graphics. You can use it > iris %>% Sort in ascending order of color=Species)) +
to visualize your data. arrange(Sepal.Length) sepal length geom_point()
> iris %>% Sort in descending order of
• dplyr is a grammar of data manipulation. You can use it to solve the arrange(desc(Sepal.Length)) sepal length • Size
most common data manipulation challenges. > ggplot(iris_small, aes(x=Petal.Length,
Combine multiple dplyr verbs in a row with the pipe operator %>%: y=Petal.Width,
color=Species,
• tidyr helps you to create tidy data or data where each variable is in a > iris %>% Filter for species "virginica"
size=Sepal.Length)) +
column, each observation is a row end each value is a cell. filter(Species=="virginica") %>% then arrange in descending
geom_point()
arrange(desc(Sepal.Length)) order of sepal length
• readr is a fast and friendly way to read rectangular data. Faceting
Mutate > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
• purrr enhances R’s functional programming (FP) toolkit by providing a mutate() allows you to update or create new columns of a data frame.
geom_point()+
complete and consistent set of tools for working with functions and facet_wrap(~Species)
vectors. > iris %>% Change Sepal.Length to be
mutate(Sepal.Length=Sepal.Length*10) in millimeters
• tibble is a modern re-imaginging of the data frame. > iris %>% Create a new column Line Plots
mutate(SLMm=Sepal.Length*10) called SLMm
> by_year <- gapminder %>%
Combine the verbs filter(), arrange(), and mutate(): group_by(year) %>%
• stringr provides a cohesive set of functions designed to make summarize(medianGdpPerCap=median(gdpPercap))
> iris %>%
working with strings as easy as posssible > ggplot(by_year, aes(x=year,
filter(Species=="Virginica") %>%
y=medianGdpPerCap))+
mutate(SLMm=Sepal.Length*10) %>% geom_line()+
• forcats provide a suite of useful tools that solve common problems arrange(desc(SLMm)) expand_limits(y=0)
with factors.
Summarize Bar Plots
You can install the complete tidyverse with:
> install.packages("tidyverse") summarize() allows you to turn many observations into a single data point.
> by_species <- iris %>%
> iris %>% Summarize to find the filter(Sepal.Length>6) %>%
Then, load the core tidyverse and make it available in your current R summarize(medianSL=median(Sepal.Length)) median sepal length group_by(Species) %>%
session by running: > iris %>% Filter for virginica then summarize(medianPL=median(Petal.Length))
> library(tidyverse) filter(Species=="virginica") %>% summarize the median > ggplot(by_species, aes(x=Species,
summarize(medianSL=median(Sepal.Length)) sepal length y=medianPL)) +
Note: there are many other tidyverse packages with more specialised usage. They are not geom_col()
loaded automatically with library(tidyverse), so you’ll need to load each one with its own call You can also summarize multiple variables at once:
to library().
> iris %>% Histograms
Useful Functions filter(Species=="virginica") %>%
summarize(medianSL=median(Sepal.Length), > ggplot(iris_small, aes(x=Petal.Length))+
> tidyverse_conflicts() Conflicts between tidyverse and other maxSL=max(Sepal.Length)) geom_histogram()
packages
> tidyverse_deps() List all tidyverse dependencies group_by() allows you to summarize within groups instead of summarizing the
> tidyverse_logo() Get tidyverse logo, using ASCII or unicode entire dataset:
characters
> tidyverse_packages() List all tidyverse packages
> iris %>% Find median and max Box Plots
group_by(Species) %>% sepal length of each
> tidyverse_update() Update tidyverse packages summarize(medianSL=median(Sepal.Length), species > ggplot(iris_small, aes(x=Species,
maxSL=max(Sepal.Length)) y=Sepal.Width))+
Loading in the data > iris %>%
filter(Sepal.Length>6) %>%
Find median and max
petal length of each
geom_boxplot()

> library(datasets) Load the datasets package group_by(Species) %>% species with sepal
> library(gapminder) Load the gapminder package summarize(medianPL=median(Petal.Length), length > 6
> attach(iris) Attach iris data to the R search path maxPL=max(Petal.Length)) DataCamp
Learn R for Data Science Interactively

You might also like