You are on page 1of 19

Course Title : Introduction to R in

Business Applications
Ram Mohan Dhara|
IMTG/ PGDM/ Term-V / 2019-2021
Session 3 : Intermediate R / Apply functions
Split your screen – one for Your Profs are rationally
log-in and other for hands- bounded! Q&A session only
on practice in last 15 mins…

Stay alert! You might have


multiple quizzes in a session Please share feedback on today’s
…and that’s evaluative! session. Help your Prof to make the
sessions better

You can’t present; your video and


After every 2-3 sessions, there is audio are in mute mode. Your are
an assignment to be submitted by not supposed to mute/ remove
the due date …and that’s your prof in session.
evaluative!
Session After completing this session, you will be able to
write programs in R using –
objectives
• Apply functions – very efficient way of using
r programming
Case Example – US crime (x77crime
dataset)
The data is available on R environment (U.S. Department of Commerce, Bureau of the Census (1977)
Statistical Abstract of the United States.). The variables –

1. State – 50 states of US
2. Population - population estimate as of July 1, 1975
3. Income - per capita income (1974)
4. Illiteracy - illiteracy (1970, percent of population)
5. Life Exp: life expectancy in years (1969–71)
6. HS Grad: percent high-school graduates (1970)
7. Frost: mean number of days with min temp below freezing point in capital or large city
8. Area: land area in square miles
9. Crime: rate per 100,000 population (1976)
Case Example - Iris flower
• The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the
British statistician and biologist Ronald Fisher in his 1936 paper on linear discriminant
analysis.
• The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris
virginica and Iris versicolor).
• Four features were measured from each sample: the length and the width of the sepals
and petals, in centimetres.
• Based on the combination of these four features, Fisher developed a linear discriminant
model to distinguish the species from each other.
• This data set became a typical test case for many statistical classification techniques in
machine learning.
Parts of a flower
Iris Setosa Iris Versicolor Iris Virginica

Three species of Iris


Apply family in R
• The apply family consists of vectorized functions. Below are the most common forms of apply
functions.
• apply()
• lapply()
• sapply()
• tapply()
• mapply()
apply()
• The apply() function is used
to apply a function to the
rows or columns of matrices,
arrays and data frames.
• It assembles the returned
values into a vector, and
then returns that vector.
• If you want to apply a
function on a data frame,
make sure that the data
frame is homogeneous (i.e.
either all numeric values or
all character strings).

REF : https://www.learnbyexample.org/
lapply()

• The lapply() function is used


to apply a function to each
element of the list.
• It collects the returned
values into a list, and then
returns that list.

REF : https://www.learnbyexample.org/
sapply()

• The sapply() and lapply()


work basically the same.

• The only difference is that


lapply() always returns a list,
whereas sapply() returns into
a vector or matrix.

REF : https://www.learnbyexample.org/
tapply()

• xxx

REF : https://www.learnbyexample.org/
dplyr package in R
1. select()- used to select cols of a data frame for viewing
2. filter() - used to filter a subset from a data frame; filtering can be done using multiple
conditions.
3. arrange ()- used to arrange the rows of a data frame according to some other variable/ column
say, by date
4. rename () - used to rename the variables
5. mutate ()- used to add new variables in the dataset
6. sample () - used to select random rows from a data frame
7. count ()- used to count the no of rows at the levels of a factor; similar to table() function
8. group_by () - used to group data by one or more variables
9. summarise () - used to summarise the variables. most powerful function for EDA.
Summary : what we have learnt
• How to write programs more efficiently in R using -
1. apply()
2. lapply()
3. sapply()
4. tapply()
5. mapply()
• How to manage a data frame using functions of
dplyr package
This concludes the session :
Introduction to R

Next session : Graphics with


GGplot

You might also like