You are on page 1of 19

BA 340 : DATA ANALYTICS

Pipes/Apply in R

Course By : Nabil Chaabane


Example
Given a vector m ← c(1, 4, 3, 0, 9)

1) Compute the log of m


2) Compute the diff of the output
3) Compute the exponential of the output
4) Round the result
Piping

We will use the %>% operator which is part of the library magrittr.

With the help of the piping operator %>%, we can restructure the
statement as follows

m %>% log() %>% diff() %>% exp() %>% round(1)
Piping
Four reasons why you should be using pipes in R:

● You'll structure the sequence of your data operations from left to right,
as apposed to from inside and out

● You'll avoid nested function calls.

● You'll minimize the need for local variables and function definitions

● You'll make it easy to add steps anywhere in the sequence of


operations.
Piping
Let us look at the mpg dataset and try to compute the average of displ for the A4 Audi cars.

mean(select(filter(mpg,model=="a4"),displ)$displ)

Using piping, we get

mpg %>% filter(., model=="a4") %>% select(., displ) %>% .$displ %>% mean(.)

Or

mpg %>% filter(model=="a4") %>% select(displ) %>% .$displ %>% mean()


Multiple placeholders

We can use the output of one statement as an input for the next statement in several palces

x %>% {cos(.)sin(.)}

Make sure to use brackets when using multiple placeholders.


Unary Functions

Unary functions are functions that take one argument.

Using the piping operator, we can define f as follows


Compound Assignment

In case you want to override the value of the left-hand side, we use the compound
assignment operator %<>%

x ← x %>% sqrt()

Becomes

x %<>% sqrt()
Tee Operator

The Tee operator %T>% returns the left hand side value rather than the potential result of
the right-hand side operations.

The tee operator can come in handy in situations where you have included functions that
are used for their side effect, such as plotting with plot() or printing to a file.

> set.seed(123)
> rnorm(200) %>% matrix(ncol = 2) %T>% plot %>% colSums
Exposing Data Variables

For functions that don’t have a data argument, such as the cor() function, it's still handy if you
can expose the variables in the data.

That's where the %$% operator comes in handy.

iris %>%subset(Sepal.Length > mean(Sepal.Length)) %$% cor(Sepal.Length, Sepal.Width)


Apply

Given the following matrix


m1 <- matrix(1:10,nrow=5, ncol=6)

Compute the sum of each column.

We solve the problem using for loops :


Apply

● The apply function is a substitute to the loop.

● apply takes Data frame or matrix as an input and gives output in vector, list or array

● The function aply(x, MARGIN, FUN) takes three arguments

- x: an array or matrix
- MARGIN: take a value or range between 1 and 2 to define where to apply the function:
-MARGIN=1`: the manipulation is performed on rows
-MARGIN=2`: the manipulation is performed on columns
-MARGIN=c(1,2)` the manipulation is performed on rows and columns
- FUN: tells which function to apply. Built functions like mean, median, sum, min, max and
even user-defined functions can be applied
Apply

Given the following matrix


m1 <- matrix(1:10,nrow=5, ncol=6)

Compute the sum of each column.

We solve the problem using apply:


lapply

● lapply() function is useful for performing operations on list objects

● It returns a list object of same length of original set

● lappy() returns a list of the similar length as input list object.

● Each element of the output is the result of applying FUN to the corresponding element of
the list

● lapply(X, FUN)

● Arguments:
-X: A vector or an object
-FUN: Function applied to each element of x
lapply

● The difference between lapply() and


apply() lies between the output return.

● The output of lapply() is a list.

● lapply() can be used for other objects


like data frames and lists.
sapply

● sapply() function does the same job as


lapply() function but returns a vector.

● sapply(X, FUN)

● Arguments:
-X: A vector or an object
-FUN: Function applied to each
element of x
tapply

● tapply() computes a measure (mean,


median, min, max, etc..) or a function
for each factor variable in a vector.

● It is a very useful function that lets you


create a subset of a vector and then
apply some functions to each of the
subset.

● tapply(X, INDEX, FUN = NULL)


● Arguments:
-X: An object, usually a vector
-INDEX: A list containing factor
-FUN: Function applied to each
element of x
Code Structure

● The main idea is to develop reusable, structured code that is easy to maintain and to
extend.
Exercise

● Create a function modulo that returns the modulo of an object x (use the operator %% 10)

● Create a 10x10 matrix M with values going from 2 to 200 with a step of 2. Use the function
seq to create such a matrix.

● Compute a new matrix whose entries are those of M modulo 10.

● Construct a list whose entries are c(1:9), c(1:12), c(1:15).

● Compute the sum of each entry.

You might also like