Ba 340: Data Analytics: Pipes/Apply in R

BA 340 : DATA ANALYTICS
Pipes/Apply in R
Course By : Nabil Chaabane

Example
Given a vector m ← c(1, 4, 3, 0, 9)
1) Compute the log of m

2) Compute the diff of the output
3) Compute the exponential of the output
4) Round the result
Piping
We will use the %>% operator which is part of the library magrittr.
With the help of the piping operator %>%, we can restructure the
statement as follows
m %>% log() %>% diff() %>% exp() %>% round(1)
Piping
Four reasons why you should be using pipes in R:
● You'll structure the sequence of your data operations from left to right,
as apposed to from inside and out
● You'll avoid nested function calls.
● You'll minimize the need for local variables and function definitions
● You'll make it easy to add steps anywhere in the sequence of

operations.
Piping
Let us look at the mpg dataset and try to compute the average of displ for the A4 Audi cars.
mean(select(filter(mpg,model=="a4"),displ)$displ)
Using piping, we get
mpg %>% filter(., model=="a4") %>% select(., displ) %>% .$displ %>% mean(.)
Or
mpg %>% filter(model=="a4") %>% select(displ) %>% .$displ %>% mean()

Multiple placeholders
We can use the output of one statement as an input for the next statement in several palces
x %>% {cos(.)sin(.)}
Make sure to use brackets when using multiple placeholders.

Unary Functions
Unary functions are functions that take one argument.
Using the piping operator, we can define f as follows

Compound Assignment
In case you want to override the value of the left-hand side, we use the compound
assignment operator %<>%
x ← x %>% sqrt()
Becomes
x %<>% sqrt()
Tee Operator
The Tee operator %T>% returns the left hand side value rather than the potential result of
the right-hand side operations.
The tee operator can come in handy in situations where you have included functions that
are used for their side effect, such as plotting with plot() or printing to a file.
> set.seed(123)
> rnorm(200) %>% matrix(ncol = 2) %T>% plot %>% colSums
Exposing Data Variables
For functions that don’t have a data argument, such as the cor() function, it's still handy if you
can expose the variables in the data.
That's where the %$% operator comes in handy.
iris %>%subset(Sepal.Length > mean(Sepal.Length)) %$% cor(Sepal.Length, Sepal.Width)

Apply
Given the following matrix

m1 <- matrix(1:10,nrow=5, ncol=6)
Compute the sum of each column.
We solve the problem using for loops :

Apply
● The apply function is a substitute to the loop.
● apply takes Data frame or matrix as an input and gives output in vector, list or array
● The function aply(x, MARGIN, FUN) takes three arguments
- x: an array or matrix
- MARGIN: take a value or range between 1 and 2 to define where to apply the function:
-MARGIN=1`: the manipulation is performed on rows
-MARGIN=2`: the manipulation is performed on columns
-MARGIN=c(1,2)` the manipulation is performed on rows and columns
- FUN: tells which function to apply. Built functions like mean, median, sum, min, max and
even user-defined functions can be applied
Apply
Given the following matrix

m1 <- matrix(1:10,nrow=5, ncol=6)
Compute the sum of each column.
We solve the problem using apply:

lapply
● lapply() function is useful for performing operations on list objects
● It returns a list object of same length of original set
● lappy() returns a list of the similar length as input list object.
● Each element of the output is the result of applying FUN to the corresponding element of
the list
● lapply(X, FUN)
● Arguments:
-X: A vector or an object
-FUN: Function applied to each element of x
lapply
● The difference between lapply() and

apply() lies between the output return.
● The output of lapply() is a list.
● lapply() can be used for other objects

like data frames and lists.
sapply
● sapply() function does the same job as

lapply() function but returns a vector.
● sapply(X, FUN)
● Arguments:
-X: A vector or an object
-FUN: Function applied to each
element of x
tapply
● tapply() computes a measure (mean,

median, min, max, etc..) or a function
for each factor variable in a vector.
● It is a very useful function that lets you

create a subset of a vector and then
apply some functions to each of the
subset.
● tapply(X, INDEX, FUN = NULL)

● Arguments:
-X: An object, usually a vector
-INDEX: A list containing factor
-FUN: Function applied to each
element of x
Code Structure
● The main idea is to develop reusable, structured code that is easy to maintain and to
extend.
Exercise
● Create a function modulo that returns the modulo of an object x (use the operator %% 10)
● Create a 10x10 matrix M with values going from 2 to 200 with a step of 2. Use the function
seq to create such a matrix.
● Compute a new matrix whose entries are those of M modulo 10.
● Construct a list whose entries are c(1:9), c(1:12), c(1:15).
● Compute the sum of each entry.

Ba 340: Data Analytics: Pipes/Apply in R

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ba 340: Data Analytics: Pipes/Apply in R

Uploaded by

Copyright:

Available Formats

BA 340 : DATA ANALYTICS

Course By : Nabil Chaabane

1) Compute the log of m

● You'll avoid nested function calls.

● You'll make it easy to add steps anywhere in the sequence of

Using piping, we get

mpg %>% filter(model=="a4") %>% select(displ) %>% .$displ %>% mean()

Make sure to use brackets when using multiple placeholders.

Unary functions are functions that take one argument.

Using the piping operator, we can define f as follows

That's where the %$% operator comes in handy.

iris %>%subset(Sepal.Length > mean(Sepal.Length)) %$% cor(Sepal.Length, Sepal.Width)

Given the following matrix

Compute the sum of each column.

We solve the problem using for loops :

● The apply function is a substitute to the loop.

● The function aply(x, MARGIN, FUN) takes three arguments

Given the following matrix

Compute the sum of each column.

We solve the problem using apply:

● lapply() function is useful for performing operations on list objects

● It returns a list object of same length of original set

● lappy() returns a list of the similar length as input list object.

● The difference between lapply() and

● The output of lapply() is a list.

● lapply() can be used for other objects

● sapply() function does the same job as

● tapply() computes a measure (mean,

● It is a very useful function that lets you

● tapply(X, INDEX, FUN = NULL)

● Compute a new matrix whose entries are those of M modulo 10.

● Construct a list whose entries are c(1:9), c(1:12), c(1:15).

● Compute the sum of each entry.

You might also like