You are on page 1of 50

Self-Learning R Essentials

Rajesh Jakhotia
22 Jun 2014

About K2 Analytics
At K2 Analytics, we believe that skill development is very important for the
growth of an individual, which in turn leads to the growth of Society & Industry
and ultimately the Nation as a whole. For this it is important that access to
knowledge and skill development trainings should be made available easily
and economically to every individual.
Our Vision: To be the preferred partner for training and skill development
Our Mission: To provide training and skill development training to individuals,
make them skilled & industry ready and create a pool of skilled resources
readily available for the industry

We have chosen Business Intelligence and Analytics as our focus area. With
this endeavour we make this Self-Learning R Essentials accessible to all
those who wish to learn R. We hope it is of help to you. For any feedback /
suggestion or you are looking for job in analytics then feel free to write back to
us at
Welcome to R!!!

Welcome to R

Introduction to R
Understanding R data structures
Importing Data

Managing Data
R Programming Structures
Basic Charting and Plotting

Performing Logistic Regression

Introduction to Rattle

Introduction to R
What is R?
Why R?
Installing R
Understanding the R interface
R environment variables and startup files
How to get help in R
R Console & R Editor

What is R?
Free software environment for statistical computing and graphics
Compiles and runs on a wide variety of UNIX platforms, Windows
and Mac OS
Official website:
R's source code is freely available under the GNU General Public

Originally created by Ross Ihaka and Robert Gentleman (hence the

name R) at the University of Auckland, New Zealand

Why R?
Free and exceptionally good statistical tool
Provides cutting edge statistical techniques as available in many
paid expensive software
Has decent data handling and data manipulation capabilities
Provides connectors to social media sites and one can also easily
get get streaming data
R can work with Big Data

Easily installed and usable with training

With time the popularity of R in analytics fraternity has been growing
(As a case study in context R was extensively used by the Analytics
Team working on the Obamas Presidential Election in 2012)

Installing R
Go to website:
Click the link based on OS Environment
Click base

Download R installer
Double click on the installer
Select Run

Follow the instruction steps

R Interface

R Console is where
you execute the

R Editor to write
and save code

Basic commands to know your

help() ## type this on your R Console

Note: R Syntax and variable names are case sensitive


Environment variables contd()

Getting environment variables.e.g.
## we request the learner to keep typing the commands on R Console and attempt to interpret the
output; Does not matter if you do not understand everything in the first run;


Sys.getenv(c("R_HOME", "R_PROFILE", "R_PROFILE_USER",

for more details:


Environment variables contd()

R_HOME: The R home directory is the top-level directory of
the R installation being run.
R_PROFILE: The path for the site-wide startup profile file of R code.
the default is R_HOME/etc/
R_PROFILE_USER: The path for the file containing user specific
profile customization. If this is unset, a file called .Rprofile is
searched for in the current directory or in the user's home directory
(in that order).

R_DATA: The path from where R loads the last saved image from
the current directory, if there is one. The extension of the is .RData


Customizing R Startup
At startup, R searches for file
Default location is R_HOME/etc/
R_HOME is the path where you installed R. In my case it is

C:/Program Files/R/R-3.0.3
The factory installation does not come up with the file.
You have to create one in notepad and save at the above path

Note: file should be used only for setting environment



Creating file in Notepad

For e.g. let us create the environment file and store it with following
R_PROFILE_USER = C:/R/Startup.Rprofile


Customizing R Startup contd

After the Renviron file, R looks for .Rprofile file.
If R_PROFILE is not set then R looks for the file in
default location, R_HOME/etc
Two important functions you can define besides other function you
create in Rprofile file are
.First() This will get executed at startup
.Last() This will get executed when we end the R Session

Note when the site file and profile file are loaded only the base
packages gets loaded. If you have to refer to any other packages
then they need to be explicitly loaded.


Customizing R Startup contd

Let us have the following function written in file
.First <- function() {

If your R is Running then
close and restart R
Go to R Editor
Click File>Save or Save
Note the default folder
Note: the above is just an
example. There is lots you
can do as part of
customizing your R Startup


Customizing R Startup contd

Let us have the following function written in the startup.Rprofile file
A <- function() {

The utility of startup.Rprofile is that here you can define all your
functions that you may wish to frequently use


Running R Code
Interactive Mode
You run R by typing the code at the R Command Prompt

Script Mode
You run your code written in script file saved with .R extension
Syntax: Source(myprog.R)
Let us create the file and save in working directory path.
To get working directory path use getwd() command
Assume we have the following statements in myprog.R file
cat(Welcome to R\n)
## \n is escape sequence for new line

Batch Mode:
R CMD BATCH c:\Training\myprog.R c:\Training\myprog.Rout


R Tip

R Tip
Set you R interface with R Console
& R Editor placed side-by-side
Write all your code in R Editor

Select the code or keep cursor on

the line which you wish to execute
from R Editor
Click the icon
to execute the
code in R Console


Understanding R Data Structure

Variables in R
Data Frames
Using c, cbind, rbind, attach and detach functions in R

Variables in R
Variable names in R are case sensitive ( A and a are two different
variables in R)
It can be alpha-numeric and can contain _ or . as part of variable
It cannot contain operators (+ - / * < > % =) or special characters like
A variable name cannot start with number

You may be able to create variable name having same name as

some other built in symbol. In that case you may not be able to use
the specific built in symbol. (so better avoid giving such names)


Scalar Variables
Scalar Variable It is single value variables. Scalars in R are vectors
of length 1

Note: In R you can use = or <- as assignment operator


Vector Variables
Vector Variable It is a sequence of numbers

Note small x and Capital X are two different vectors. R is case sensitive
c is the concatenate function
You can easily do mathematical operation on two vectors of same size just
as you would do on two scalar variables
All vector elements must be of the same mode; it can be integer, numeric,
string, object, etc


Matrix variable is a 2 way table structure having rows and columns

Note the subtle difference in which the values have got populated in matrix m & M

Also note that to create the matrix we have used the R function named matrix by
passing certain arguments and values


How to get help?

Let us get help on the matrix
The help syntax is help() or ?

The help opens up the help

content on internet


In a Vector all values can be of only one mode type
In case you wish to save values of different mode types then we
should use Lists. Sample Syntax:


Lists contd
Vectors in R are similar to Arrays in C. Elements cannot be deleted
in Vectors and if you wish to do it then use Lists
Adding to Lists

Deleting from Lists

List having List and Vector as its elements


Accessing List Elements

Note the difference of [[ ]] and [ ] in below two examples

[ ] returns a sublist; [[ ]] returns a value

List element can also be accessed using name tags as shown below


List unlist
E.g. 1

E.g. 2

E.g. 3

unlist() returns a vector

E.g. 1 The List did not have name tags and as such the vector created from unlist
does not have same

E.g. 2 Name tags exists. The mode of the vector is character. (LCD rule)
E.g. 3 Note the suffixes 1, 2, 3, and 4 given to the VectorElement tags of the List


Summarizing List
Lists are kind of vectors which can store values of different modes
We can add / delete values from list
List values can be given name tags

List elements can be accessed by [[ ]], [ ], or name tags

[[ ]] returns the values; [ ] returns the sublist
unlist() returns a vector; the mode of the vector created from unlist()
depends on the Least Common Denominator (LCD)
. Finally if we want the length of the list we can use length()


Data Frames
Data Frame is used for storing data tables.
Very simply said, what we call Table in SQL parlance, Dataset in
SAS is called Data Frame in R terminology
The columns are the Vectors
Small e.g. to create a Data Frame
The first line of the data table showing the
column names is called header.
Each horizontal line representing a record
is called row

Each data member of the row is called cell

The cell data is accessed by specifying it
row and column coordinates in [ ]


rm - Remove
rm is to remove objects no longer needed
Note: R does in memory processing and hence it is
advisable to keep removing objects which are not required.

To get the objects currently in R memory use

the function ls()

To remove a column from data frame you set

it to NULL


cbind, c, and rbind

Adding a column to data frame Creating new variables from
using cbind
existing fields

Reordering the columns

Adding row using rbind

Note: c() adds them head to tail; cbind() combines them into matrix
form; rbind() adds them row-wise


attach() and detach()

attach() takes a data frame or list
as argument
It helps refer columns of the data
frame or list without having to
prefix it with the object name
If there are objects in Global
Environment having same names
as column name in data frame /
list then those columns will have
to be accessed with $ symbol
Using rm() we have removed the
global variable, then detached
and attached the data frame


attach() and detach()

Note: In R lists and data frames
can only be attached at position
2 or above, and
what is attached is a copy of the
original object. You can alter the
attached values via assign
operator, but the original list or
data frame is unchanged.
To make change to the original
list or data frame the column has
to be referred along with $


We are creating a vector named
data and it is of type character

Using factor function we are

converting the type from character
to data;
Note the Levels; (pl note, I have executed this command by
narrowing the R Console so that the integer values for some of the
levels can be displayed)

Factor provide an efficient ways of storing data in R. If you have large data frame having
categorical variable then Factor converts the categorical values into levels and each level
corresponds to an integer number; For the factor column, this integer value is stored in the
data frame rather than the actual value.

I hope this should clarify things further


From previous e.g. and this e.g.
you can see that the levels are in
ascending order

Assigning Labels to Factor Levels


Importing Data
Reading tabular datafiles
Reading CSV files
Importing data from Excel
Importing data from SAS
Accessing Database
Saving in Rdata
Loading Rdata Objects
Writing to files

read.table function reads data from txt / csv file and returns a Data
file = <the file path>
sep = argument to specify the separator
header = TRUE; if the first row of the data contains column names
stringsAsFactors = FALSE; this option will prevent character
variables to be converted to Factors = argument can be used to suppress factor conversion for
certain specific column; TRUE will ensure suppression of factor
There are many other arguments; run ?read.table command to get full help on all the arguments


read.table e.g.
Sample Data file
Data Import Syntax

Note: The columns have been named as V1, V2.V8


read.table contd

Note the columns have the proper names as was in the first row of
the data file
In case the file is tab delimited the sep argument will become
sep = \t


Read.fwf is used to read Fixed Width Format file
~ is to be replaced by full folder path

Note that the headers if present should be separated by some

separator; default separator is tab \t

Let us the data types / class for

the two variables
Cust_ID is an identifier field; If we do not wish to have auto factor
conversion for character variables then use option stringsAsFactors


Importing data from MS Excel

Importing using RODBC Library

Importing using XLConnect Library


Importing data from SAS dataset

To importing data from SAS your require sas7bdat Library

Load the library and call the read function to import from SAS dataset


Reading data from Database

Database Access using RODBC package
Open Database Connection

with trusted connection

with login / password

Get the contents of a database table in Data Frame

Close Connection


Saving Objects
Let us start the R session afresh and try the below

option if the
column headers
is the first row in
the file


Loading Saved Objects

Loading a saved image will load all the objects which were in memory
at time of saving the image

Or you may choose to load only specific saved objects


Writing / Exporting data to a file


Note: the above command writes the row names (here row numbers are row names)
as an addition column in the output file. to avoid this use the option row.names=F


Writing / Exporting. contd

Writing output of summary statistics or other things to a file


Thank you
End of Part 1