You are on page 1of 3

BMR Assignment

Tidyr:
Tidyr is the package which helps the user to tidy up the data. It helps the user to arrange the
data in a more efficient manner to work easier with it. tidyr provides three main functions for
tidying your messy data: gather(), separate() and spread().
Gather collects a set of column names and places them into a single “key” column. It also
collects the cells of those columns and places them into a single value column.
Syntax :
gather(data,key = "key", value = "value",...,na.rm = FALSE,convert = FALSE,factor_key =
FALSE)
Example: We use gather to group the data into single value colomn

Separate a character column into multiple colums, with a regular expression or numeric
locations
Syntax :
separate(data, col, into, sep = " ", remove = TRUE, convert = FALSE)
Example: We use Separate to filter out the data from another data set and display it as a new
data , for example separating vaccine details according to gender

Unite multiple columns into one by pasting strings together


Syntax :
unite(data, col, ..., sep = " ", remove = TRUE)
Example: To unite dispersed data in a column such as position of a player and kit number.

Spread turns a pair of key:value columns into a set of tidy columns. This code is superseded
by pivot_wider()
Syntax :
spread(data, key, value, fill = NA, convert = FALSE)
Example: Spread is just the opposite of gather() function
Dplyr:
The dplyr is a powerful R-package to manipulate, clean and summarize unstructured data. In
short, it makes data exploration and data manipulation easy and fast in R. The different types
of functions in dplyr can be used according to the needs of the user.Some of the dplyr
functions are : -
Filter function is used to subset a data frame, retaining all rows that satisfy the conditions.
Syntax: filter(data, Preserve = FALSE)
Example : filter(d1,kit_number==7)

Group By takes an existing table and converts it into a grouped table. The table will be in an
ordered manner after using the group function.
Syntax: group_by(.data, ..., .add = FALSE, .drop = group_by_drop_default(.data))
Example: group_by(d1,Position)

Select function is used to select the data or columns from a dataframe or a table.
Syntax: select(.data, ...)
Also select() function can be used to drop a column or constraint and display the required
datas.
For that purpose add – (minus) symbol infront of the constraint to be droped. Major sub
functions in Select function
Start_with() – used as a sub function to select data/string which start with the specified
condition/suffix.
End_with() - used as a sub function to select data/string which end with the specified
condition/prefix.
Contains() – used as a sub function to select data which contains a literal string.
Matches() – used as a sub function to select data which matches a regular expression.
Num_range() - – used as a sub function to select data which have a numerical range like
x01,x02,x03.
One_of() – used as a sub function to select data which are variables in character vector.
Everything() - – used as a sub function to select all variables.

Example : select(d1,name)
Mutate adds new variables and preserves existing ones, A new column can be added to the
data frame to display the output of the given condition.

Syntax: mutate(.data, …)

Example: mutate(d1,result = if else(kit number>5,"good","bad"))

Join Combining two data frames is done with the join function. Add column from one table to
another table. Basically we use merge() to combine two data frame and the sub functions
are: Natural , Left Outer Join , Right Outer Join ,Full Outer Join ,Cross Join ,Semi Join ,Anti Join

Syntax : merge(data1,data2)

Example: merge(d1,df2)

Summarise function creates a new data frame. It is used to summarise or aggregate data. It
will have one (or more) rows for each combination of grouping variables; if there are no
grouping variables, the output will have a single row summarising all observations in the
input.

Syntax: summarise(.data, ..., .groups = NULL)

Example: summarise(d1, kitmean=mean(kit_number))

Arrange function orders the rows of data frame by the values of selected column. By default
the ordering is ascending. If need to be arranged in the descending order add DESC inside the
function.

Syntax: arrange(.data, ..., .by_group = FALSE)

Example: arrange(d1,kit_number)

You might also like