You are on page 1of 13

1

Complied by
Dr. Vanita Joshi
2
How to install tidyverse and dplyr ?

 Run following commands on R studio


install.packages(“tidyverse”)
install.packages(“dplyr”)

 Loading the packages to use functions from


them
library(tidyverse)
library(dplyr)
3
What is dplyr ?

 One of the powerful features of the R programming language is


its extensibility. One of those extensions, or packages as R calls them,
is dplyr. The 'd' stands for data frames, and the 'plyr' is the name of
another package that the R developers called pliers.
 The dplyr is a powerful R-package to manipulate, clean and summarize
unstructured data. In short, it makes data exploration and data
manipulation easy and fast in R.
4
Key Features of Dplyr package
5
Select ()

 select() function is used to choose a column.

Ex. The function accepts 2 or more parameters: the name of the data frame,
and the column(s) being selected.
select(mtcars, cyl, wt)

Ex. To select all data except the following columns: drat, vs, am, gear, and
carb
cars<- select(mtcars, -drat, -vs, -am, -gear, -carb)
print(cars)
Other Parameters

6
Select with parameters

Parameter Use Example

: Selects a range of columns select(mtcars, mpg:hp)

starts_with Select columns that start with a string select(mtcars, starts_with('c'))

contains Select columns that contain a string select(mtcars, contains('y'))


select(mtcars, one_of('mpg', 'carb',
one_of Select columns that are from a group
'am'))
7
Filter ()

 filter() function used to filter data based on the row values, not the
columns.
Ex. To filter out all cars that have a gross weight over 4 tons, we would
use the filter() function as follows:
filter(mtcars, wt > 4)
Ex. To filter out all 8-cylinder cars that have more than four carburetors.
Separate each condition with a comma in the filter function. (one
can use & and | operator also for multiple conditions)
filter(mtcars, cyl == 8, carb > 4)
8
Arrange()

 arrange() function used to sort data. The arrange function takes two or more
parameters: the name of the data frame, and the column(s) by which to sort.
Ex. To sort our table by cylinders and miles-per-gallon.
arrange(mtcars, cyl, mpg)

 Pipe Operator (%>%)


The pipe operator (%>%) forces R to read functions left to right instead of right to left. It
pipes, or transfers, output from the first function to the input of a second function. It is
used to combine multiple functions.
Ex. To combine the arrange function with the select function. It will invoke the select
function, then invoke arrange.
mtcars %>% select(cyl, mpg) %>% arrange (cyl, mpg)
9
Mutate()

 mutate () function is used to create new variables. It


manipulate some pretty big data sets.
Syntax: mutate(data, new_var=[existing_var])

Ex. To find the ratio between miles per gallon and


cylinders.
mutate(mtcars, mtcars_new = mpg / cyl)
10
Summarise()

 Summarise () function is used to get the summary statistics of each individual


column(s). Any R Base in-built functions can be used with it.
Ex. Average weight of cars in mtcars
summarise (mtcars, mean=mean(wt))
If a data contain missing value then na.rm=TRUE should be used
(Since mtcars Dataset has no missing values(NA) lets use starwars dataset from dplyr )
library('dplyr')
head(starwars)
summarize(starwars, mean(height,na.rm=TRUE)) `
mean(height, na.rm = TRUE)`
<dbl> 1 174.
11
Group_by() / split-apply-combine

 Many data analysis tasks can be approached using the “split-apply-


combine” paradigm: split the data into groups, apply some analysis to
each group, and then combine the results.
 dplyr makes this very easy through the use of the group_by() function,
which splits the data into groups. When the data is grouped in this
way summarize() can be used to collapse each group into a single-row
summary.
Ex. Counting total cars by grouping no. of cylinders.
 mtcars %>%group_by(cyl) %> %summarise (n())
 summarize(group_by(starwars,species),avg= mean(height,na.rm=TRUE))
12
Class Exercise:

 Use Starwars data set (from dplyr package) to answer the following:
1. Display all names starting with letter ‘L’
2. Display name, height & mass.
3. Display all fields except eye color, birth year and homeworld.
4. Take out all with hair color brown and mass more than 100.
5. Display all in descending order of height.
6. Display average height and mass.
7. Add a new variable for Height in feets.
8. Display median mass for all male and female.
9. Display total number of people with brown hair.
13

You might also like