You are on page 1of 189

Fundamentals of R

What is R?
* R is an open-source programming language
that facilitates statistical computing and graphical
display and data analysis.
* R was created by Robert Gentleman and Ross
Ihaka at the University of Auckland.
R can do
1. Data manipulation and effective way of data
handling.
2. Statistical computing as well as simulations.
3. Graphical display.
4. Different type of data analysis.
5. Store the input and output in the form of a
scalar as well as form of a vectors or a
matrix.
6. Simple as well as complicated
calculations are also possible.
7. Graphical display can be made over the
screen as well as we can save them in
the postscript file such as PDF file, jpeg
file .
8. R software has its own very effective
programming language.
9. We can obtain it for free from the website
www.r-project.org.
Switching to R software?
Advantage for switching to R software:
*R has a statistical computing environment.
*R is a free software.
*R is an open source.
*R can save command, which are executed in script file.
*R software is available for all the platforms such
as windows, Unix, Linux etc.
*R have both types of packages one built in packages as well as
contributed packages
and the users are also given an opportunity to make their own
packages.
*R has an interpreter and it is not actually a compiler.
*All the packages are available for free at the R project website
called CRAN
The R programming environment comes packed up
to brim with exciting features. Let’s learn some of
these features:
• Open-source: R is an open-source programming
language. It is completely free for anybody to use.
• Variety of packages: There are more than 15,000
packages for R on online repositories like CRAN,
Bioconductor, and GitHub.
• Powerful graphics: R’s graphical capabilities are
amazing. It can produce publication-quality graphs and
plots of any kind with its base package. With added
packages like ggplot2 and plotly the possibilities are
endless.
• No need for a compiler: The R language is
interpreted. It does not need a compiler to convert the
code into a program.
• Cross-platform support: R is cross-platform
supportive that is it can run on any OS and in any
software environment without any hassle.
Why Learn R?
R is a very popular programming language and
analysis tool. It is used by more than 2 million people
around the globe. All of them have their own reasons
to learn and use R:
• Useful as an analysis tool for big data.
• Most popular for academic research.
• Can visualize the results of data analysis.
• Useful for building machine learning models.
• It has a plethora of features that make it the preferred
tool over many others for professionals and hobbyists
alike.
• Open-source and, therefore, everybody can contribute
to it in their own way.
• It has more than 15,000 packages that expand its
powers.
• R can be useful for data analysis, data visualization,
software development, web development, making
attractive analysis reports and more.
Installation of R
We can use it on any OS without any changes or
compatibility issues. We can also integrate it with many
different programming languages and software
frameworks like C, C++, FORTRAN, Java, Python, or
Hadoop.
Steps to Install R, RStudio, and R Packages
We have to follow three basic steps in the same order to
run R and RStudio on your system.
• Install R
• Then, install RStudio
• Lastly, install R Packages
The process of installing R depends on the
type of your operating system. R can be
installed in the following ways:
• For Mac
a) Download the latest version of the .pkg file
from the following link:
CRAN R Project
b) Open the downloaded .pkg file and install
R.
• for Linux
For Ubuntu with Apt-get installed, execute
sudo apt-get install r-base in terminal.
• for Windows
step 1: Go to the website – CRAN R Project Windows
Step 2: Click on the “Download R 3.6.0 for Windows”
Step 3: Clicking on the tab will download the R
installer. Double click on the installer to launch it.
Step 4: Select the language of your choice in the
installer and click OK.
Step 5: Clicking on Next will lead you to the important
information of the license. Click on
Step 6: Then, it will provide you with an interface for
selecting the destination for your R installation.
Step 7: After clicking on Next, you will be directed to
select components for installation. It is ideal to keep
all the components.
Step 8: In the next step, you will be asked if you
would want to customize the startup. Click
on No(accept defaults) as an ideal option.
Step 9: In the next step, you will be asked to select
the start menu folder. By default, start menu folder
cannot be created but you can change that by
unchecking the checkbox.
Step 10: In the next step, you will be asked for
additional details. You can create a desktop
shortcut or a Quick Launch Startup by checking the
boxes.
Step 11: Clicking on Next will prompt the installation
to start. After this, the installation is complete and R
is installed.
RStudio
RStudio is one of the most popular
open-source IDE (integrated development
environment ) that facilitates statistical
modeling as well as graphical capabilities for
R.
We are going to perform the following
operations:
• Downloading/Importing Data in R.
• Data Transformation and other Miscellaneous
Data Operations.
• Performing Statistical Modeling on the Data.
• Creating Graphical Plots of Data.
RStudio of R has four panels:
• Console – This is the actual R window,
you can enter R commands here. And,
thus execute them by pressing enter.
• Source – This is where we can edit
scripts. It is where you should always be
working. Control-enter sends selected
codes to console.
• Plots/Help – Here plots and help pages
will be shown.
• Workspace – Shows which objects you
currently have.
Installation of RStudio
In order to install the IDE RStudio, we will take
the following steps:
Step 1: Go to the website – RStudio download
Step 2: Here, you will find various versions of R.
Click on the Download button.
For Windows
Step 1: Clicking on Download will shift the page
lower to the various platforms of R installation.
Click on Windows 7+ (64 Bit)
Step 2: After the download is finished, click
on the installer and click Next.
Step 3: Clicking on Next will direct you to
the installation location.
Step 4: Clicking on Next will direct you to
choose the Start Menu Folder.
Step 5: Clicking on Install will start the
installation process.
Step 6: Finally, click on the Finish button.
Step 7: In the final step, test if R is properly
installed by opening RStudio and typing
the following lines:
Line<-”HELLO WEL COME TO R
PROGRAM”
Line
Installation of R Packages
Install the R Packages (Optional)
In RStudio, if we require a particular library,
then we can go through the following
instructions:
• First, run RStudio.
• After clicking on the packages tab, click on
install.
• In the Install Packages dialog, write the
package name you want to install under the
Packages field and then click install. This will
install the package you searched for or give
you a list of matching packages based on
your package text.
How to Install R Packages for Windows
1)In Windows, we get the package menu and install option
which is very easy.
After selecting a local mirror site, a list of available binary
packages is being shown. You can choose the ones you
need. Once you have selected the packages you need, you
need to click the OK button to download and install them into
R.
2) If we download the package from the internet(as.zip)
We need to use the install package(s) in the packages
menu. It allows we to select the files we need and again
packages are unzipped and installed into R.
3) Installing by the Name of Package
By using install.packages(name) command any
packagecan be installed.
Install & Use Packages in R
Programming
Packages in R
* A package is a collection of R functions, data,
and compiled code in a well-defined format.
* Packages are being stored in the directory
called the library.
* R comes with a standard set of packages.
* We can find all the list of available packages
that are installed in our system , by using
search() command
* To see what packages are available, we use
installed.packages() command.
How to Use Packages in R
• We need to load the package in R after installing
them to make them usable.
• To load the R language Package, we can use
the command library()
NOTE: In R, you can unload a package by
using detach() command, as follows:
detach(package:name)
Example:
install.packages(stats)
library(stats)
detach(package:stats)
8 Useful R Packages
* R has over 10,000 packages in the CRAN repository. Being the most popular
language of choice for statistical modeling.
* The most common and useful libraries that use in everyday tasks are-

1. dplyr
For performing data wrangling and data analysis, we make use of
the dplyr package. We use this package for facilitating various
functions for Data frame in R. Furthermore, data scientists use this
package for carrying out several functions with the data frames.
2. ggplot2
This is the most popular visualization package that is created for R.
ggplot2 is popular as it facilitates declarative creation of graphics.
With this package, you can create aesthetically pleasing and elegant
plots and graphs.
3. tidyr
tidyr is a package that we use for tidying the data. We consider this
data to be tidy when each variable represents a column and each
row represents an observation.
4. shiny
shiny is an interactive web application that allows you to embed visualizations
like graphs, plots, and charts. These interfaces are directly written in R and
provide a customizable slider widget that has built-in support for animation.
5. caret
The caret package is short for Classification and Regression Training. Using
this function, you can model complex regression and classification problems.
One important extension of the caret is CaretEnsemble which is used for
combining different models.
6. E1071
This package has wide use for implementing clustering, Fourier Transform,
Naive Bayes, SVM and other types of miscellaneous functions.
7. plotly
This package extends on the JavaScript library which is mainly for building
interactive quality graphs. We can then embed these graphs on web
applications quite easily.
8. tidyquant
tidyquant is a package that we use for carrying out financial operations on the
data. Data Scientists use this package for carrying out quantitative financial
analysis.
Updating R , Rstudio and
packages
Updating R and RStudio
1. Updating R
updateR()
After running the updateR(), R will detect
if there is a new version available. If there
is, it will download and install it.
2. Updating RStudio
Updating RStudio is easy, go to Help >
Check for Updates to install a newer
version.
3. Updating R Packages
Updating Out of date Packages
Install.packages() is used to install
any package of CRAN.
For example:

a)Typing Install.packages(“ggplot2”) in
the console will install the latest
package of ggplot2.
b)We can also update any out of
date package with the help of
the update.packages() function.
Basic Interaction with R Console
1) The most common form of interaction with R is through
the command line in the console.
The user enters the command in the console.
Example:
print(“HELLO”)
After pressing the Enter key, the R interpreter executes
and returns the answer to the user.
2) It is also possible to store a sequence of commands in a
file. Use .R script and then ask R to execute all
commands in the file that has .R script.
R Command Prompt
1) Once we have R environment setup, then it’s easy to start R
command prompt by just typing($R) at your command prompt .
2)This will launch R interpreter and you will get a prompt > where we
can start typing our program.
Example:
Print(“Wel come to R program”)
R Script File
Usually, we will do our programming by writing our
programs in script files and then we execute those scripts
at our command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in
a text file called test.R as under −
# My first program in R Programming
print (“Wel come to R program”)
Save the above code in a file wel.R and execute it
command prompt as given below.
Using R as a Calculator
• Users type expressions to the R interpreter.
• R responds by computing and printing the
answers.
Example:
1+2
1/2
Datasets
There are a large number of datasets that are
open for free analysis in R. Furthermore, there
are other open datasets that are available on
Kaggle which you can directly utilize for
analyzing and predicting.
Comments
Comments are like helping text in your R
program and they are ignored by the
interpreter while executing your actual
program.
Single comment is written using # in the
beginning of the statement.
Example:
# My first program in R Programming
What are Data types in R?
• For correct processing, a programming language must know what
can and cannot be done to a particular value. For example, addition
cannot be performed on the words ‘hello’ and ‘world’.
• Similarly, you cannot change the numbers 1 and -34.5 from lower to
uppercase.
• Due to this, R has a feature called the data types.
• Different kinds of values are assigned different data types that help
differentiate them.
• These types have certain characteristics and rules associated with
them that define their properties.
• R provides the class() and typeof() functions to find out what is the
class and type of any variable.
R has five data types which are:
• Numeric
• Integers
• Complex
• Logical
• Characters
1.Numeric Data Type
The numeric data type is for numeric values. Decimal values are
referred to as numeric data types in R. we can assign a decimal
value for any variable.
Examples of numeric values would be 1, 34.5, 3.145, -24, -45.003,
etc.
Example:
num <- 1
class(num)
typeof(num)
2. Integers Data Type
The Integer data type is used for integer values. A numeric value
with no fraction called integer data is represented by “Int”. The
integer data type is commonly used for discrete values. We can
store as well as convert a value into an integer type using the
as.integer() function.
Example:
int <- as.integer(16)
class(int)
typeof(int)
int2 <- as.integer(num)
int2
class(int2)
typeof(int2)
NOTE: We can also use the capital ‘L’ notation to denote that a
particular value is of the integer data type.
Example:
int3 <- 5L
class(int3)
typeof(int3)
3. Complex Data Type
The complex data type is to store numbers with an imaginary
component. Examples of complex values would be 1+2i, 3i, 4-5i,
-12+6i, etc.
Example:
comp <- 22-6i
class(comp)
typeof(comp)
4. Logical Data Type
The logical data type stores logical or boolean values of TRUE or
FALSE.
Example:
logi <- FALSE
class(logi)
typeof(logi)
5. Character Data Type
The character data type stores character values or strings. Strings in
R can contain the alphabet, numbers, and symbols. The easiest
way to denote that a value is of character type in R is to wrap the
value inside single or double inverted commas.
Example:
char <- "dataflair1234"
class(char)
typeof(char)
NOTE: We can also use the as.character() function to store a value
as a character or to convert a value to the character data type.
Example:
char2 <- as.character("hello")
char3 <- as.character(comp)
char2
char3
class(char2)
typeof(char2)
class(char3)
typeof(char3)
Converting Data Types in R
Conversion into Numeric
We can use the as.numerical function to convert the
values of other data types into numerical values.
The conversion follows a few rules, which are:
• To convert an integer value into a numeric, we can use
the as.numeric function.
• We can convert a complex value into numeric by using
the function. This removes the imaginary part of the
number.
• Logical values can be converted into numeric as well by
the function. The TRUE value is converted to 1, and
FALSE is converted to 0.
• Character values can similarly be converted into
numerical values but if the string contains letters,
alphabets, and symbols then the numeric value becomes
NA.
Examples:
1nt=16.0
num2 <- as.numeric(int)
num2
comp=16+2i
num3 <- as.numeric(comp)
num3
logi <- TRUE
num4 <- as.numeric(logi)
num4
char <- "1234"
num5 <- as.numeric(char)
num5
num5 <- as.numeric("1234")
num5
Conversion into Integer
The as.integer function can convert the values of other
data types into integer values according to the
following rules:
• Numeric values can be converted into an integer using
the function. This removes any decimal values from the
number.
• Complex values can also be converted into integers.
The function removes the imaginary parts of the
number.
• The conversion from logical values to integers is
similar to the conversion of logical values to numerics.
TRUE is converted to 1, and FALSE is converted to 0.
• Character values can be converted into integers as
well by using the as.integer function. This conversion
follows the same rules as the character to numeric
conversion.
Example:
num=16.0
int4 <- as.integer(num)
int4
int5 <- as.integer(14.7)
int5
comp=28+6i
int6 <- as.integer(comp)
int6
logi=FALSE
int7 <- as.integer(logi)
int7
int8 <- as.integer("1234")
int8
Conversion into Complex
• Using the as.complex function, we can convert other
values into the complex data types. The conversion
takes place according to the following rules:
• Numeric values can be converted into complex by
using the as.complex function or by adding an
imaginary part to it.
• Integer value can also be converted into complex
values similarly.
• Logical values become 0+0i for FALSE and 1+0i for
TRUE when converted into complex values using the
as.complex function. We can also convert a logical
value into a complex value by adding an imaginary part
to it.
• The conversion from a character to a complex is the
same as the conversion from character to numeric or
an integer with 0i added to the converted value if it is
not NA.
Example:
num=28.9
comp2 <- as.complex(num)
comp2
Int=36L
comp3 <- as.complex(int)
comp3
logi<-TRUE
comp4 <- as.complex(logi)
comp4
comp5 <- as.complex("1234")
comp5
Conversion into Logical
• Conversion into logical data type can be done by using
the as.logical function, by following the given rules:
• Numeric, integer, and complex values can be converted into logical
values, but the function returns FALSE if the value is zero and
TRUE if it is anything else.
• Character values when converted by the as.logical function, always
return NA.
Example:
logi2 <- as.logical(28.5)
logi2
logi3 <- as.logical(90L)
logi3
logi4 <- as.logical(36+6i)
logi4
logi5 <- as.logical(“HELLO”)
logi5
logi6 <- as.logical(0)
logi6
Conversion into character
We can convert a value of any data type into
character data type using
the as.character function. The function converts
the original value into a character string.
Example:
char2 <- as.character(98.89)
char2
char3 <- as.character(789L)
char3
char4 <- as.character(58+6i)
char4
char5 <- as.character(FALSE)
char5
VARIABLE
A variable provides us with named storage that our programs can
manipulate. A variable in R can store an atomic vector, group of
atomic vectors or a combination of many R objects.
A valid variable name consists of letters, numbers and the dot or
underline characters. The variable name starts with a letter or the
dot not followed by a number.
1)var_name2:Valid:Has letters, numbers, dot and underscore
2) a) .var_name b) var_name
Valid:Can start with a dot(.) but the dot(.)should not be followed by a
number.
3) var_name%:Invalid:Has the character '%'. Only dot(.) and
underscore allowed.
4) 2var_name:Invalid:Starts with a number
5) .2var_name:Invalid:The starting dot is followed by a number making
it invalid.
6) _var_name:Invalid:Starts with _ which is not valid
Variable Assignment
The variables can be assigned values using leftward, rightward
and equal to operator. The values of the variables can be
printed using print() or cat() function. The cat() function
combines multiple items into a continuous print output.
Example:
# Assignment using equal operator.
var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")
Finding Variables
To know all the variables currently available in
the workspace we use the ls() function.
Also the ls() function can use patterns to match
the variable names.
Example: 1) print(ls())
2) # List the variables starting with the pattern "var".
print(ls(pattern = "var"))
3) print(ls(all.name = TRUE))
Deleting Variables
1) Variables can be deleted by using
the rm() function. Below we delete the
variable var.3. On printing the value of the
variable error is thrown.
Example:
rm(var.3)
print(var.3)
2) All the variables can be deleted by using
the rm() and ls() function together.
Example:
rm(list = ls())
print(ls())
OPERATORS
An operator is a symbol that tells the compiler to
perform specific mathematical or logical
manipulations. R language is rich in built-in
operators and provides following types of operators.
Types of Operators
Following are the types of operators in R programming

• Arithmetic Operators
• Relational Operators
• Logical Operators
• Assignment Operators
• Miscellaneous Operators
Arithmetic Operators
1) Addition : operator +
2) Subtraction: operator –
3) Multiplication: operator *
4) Division: operator /
5) Exponential : operator ^
6) Gives reminder: operator %%
7) Gives only integer part after division: operator %/%
Example:
A<-c(13,56,10,25)
B<-c(2,3,4,6)
print(A+B)
print(A-B)
print(A*B)
print(A/B)
print(A%%B)
print(A%/%B)
print(A^B)
Relational Operators
1) Greater than : operator >
2) Less than : operator <
3) Equal to : operator ==
4) Greater than and equal : operator >=
5) Less than and equal : operator <=
6) Not equal : operator !=
Example:
A<-c(5,6,9,10,23)
B<-c(4,8,9,12,35)
print(A>B)
print(A<B)
print(A>=B)
print(A<=B)
print(A==B)
print(A!=B)
Logical Operators
1) Logical AND: operator & (gives a output TRUE if both the
elements are TRUE)
2) Logical OR : operator | (gives a output TRUE if one the
elements is TRUE)
3) Logical NOT :operator ! (gives the opposite logical value)
Statement I Statement II AND OR

TRUE TRUE TRUE TRUE


TRUE FALSE FALSE TRUE
FALSE TRUE FALSE TRUE
FALSE FALSE FALSE FLASE

NOTE: The logical operator && and || considers only the first
element of the vectors and give a vector of single element
as output.
Example:
1) v <- c(3,1,TRUE,2+3i)
t <- c(4,1,FALSE,2+3i)
print(v&t)
2) v <- c(3,0,TRUE,2+2i)
t <- c(4,0,FALSE,2+3i)
print(v|t)
3) v <- c(3,0,TRUE,2+2i)
print(!v)
4) v <- c(3,0,TRUE,2+2i)
t <- c(1,3,TRUE,2+3i)
print(v&&t)
5) v<- c(0,0,TRUE,2+2i)
t <- c(0,3,TRUE,2+3i)
print(v||t)
Assignment Operators
These operators are used to assign values to vectors.
Left assignment: operator <- or = or <<-
Right assignment: operator -> or ->>
Miscellaneous Operators
These operators are used to for specific purpose and
not general mathematical or logical computation.
: - Colon operator. It creates the series of numbers in
sequence for a vector.
%in% - This operator is used to identify if an element
belongs to a vector.
%*% - This operator is used to multiply a matrix with its
transpose.
Example:
1) v <- 2:8
print(v)
2) v1 <- 8
v2 <- 12
t <- 1:10
print(v1 %in% t)
print(v2 %in% t)
3) M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow =
TRUE)
t = M %*% t(M)
print(t)
STRINGS
Any value written within a pair of single quote or double
quotes in R is treated as a string. Internally R stores
every string within double quotes, even when you create
them with single quote.
Examples of Valid Strings
a <- 'Start and end with single quote'
print(a)
b <- "Start and end with double quotes"
print(b)
c <- "single quote ' in between double quotes"
print(c)
d <- 'Double quotes " in between single quote'
print(d)
String Manipulation
Concatenating Strings - paste() function
Many strings in R are combined using
the paste() function. It can take any number of
arguments to be combined together.
Syntax
paste(..., sep = " ", collapse = NULL)
Following is the description of the parameters used

• ... represents any number of arguments to be
combined.
• sep represents any separator between the
arguments. It is optional.
• collapse is used to eliminate the space in
between two strings. But not the space within two
words of one string.
Example:
a <- "Hello“
b <- 'How'
c <- "are you? "
print(paste(a,b,c))
print(paste(a,b,c, sep = "-"))
print(paste(a,b,c, sep = "", collapse = ""))
Counting number of characters in a string - nchar()
function
This function counts the number of characters including
spaces in a string.
Syntax
nchar(x)
Following is the description of the parameters used −
• x is the vector input.
Example:
result <- nchar("Count the number of characters")
print(result)
Extracting parts of a string - substring() function
This function extracts parts of a String.
Syntax
substring(x,first,last)
Following is the description of the parameters
used −
• x is the character vector input.
• first is the position of the first character to be
extracted.
• last is the position of the last character to be
extracted.
Example:
# Extract characters from 5th to 7th position.
result <- substring("Extract", 5, 7)
print(result)
Changing the case - toupper() & tolower() functions
These functions change the case of characters of a
string.
Syntax
toupper(x) and tolower(x)
Following is the description of the parameters used −
• x is the vector input.
Example:
# Changing to Upper case.
result <- toupper("Changing To Upper")
print(result)
# Changing to lower case.
result <- tolower("Changing To Lower")
print(result)
Formatting numbers & strings - format() function
Numbers and strings can be formatted to a specific
style using format() function.
Syntax
format(x, digits, nsmall, scientific, width, justify =
c("left", "right", "centre", "none"))
Following is the description of the parameters used −
• x is the vector input.
• digits is the total number of digits displayed.
• nsmall is the minimum number of digits to the right of
the decimal point.
• scientific is set to TRUE to display scientific notation.
• width indicates the minimum width to be displayed by
padding blanks in the beginning.
• justify is the display of the string to left, right or center.
Example:
# Total number of digits displayed. Last digit rounded off.
result <-format(23.123456789, digits = 9)
print(result)
# Display numbers in scientific notation.
result <- format(c(6, 13.14521), scientific = TRUE)
print(result)
# The minimum number of digits to the right of the decimal point.
result <- format(23.47, nsmall = 5)
print(result)
# Format treats everything as a string.
result <- format(6)
print(result)
# Numbers are padded with blank in the beginning for width.
result <- format(13.7, width = 6)
print(result)
# Left justify strings.
result <- format("Hello", width = 8, justify = "l")
print(result)
# Justfy string with center.
result <- format("Hello", width = 8, justify = "c")
print(result)
Data Structures in R

The frequently used ones are −


• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames
Vectors
What is R Vector?
*A vector is a sequence of elements that share the same data
type. These elements are known as components of a vector.
*R vector is the basic data structure, which plays an essential role in
R programming
R vector comes in two parts: Atomic vectors and Lists. They have
three common properties:
• Type function – What it is?
• Length function – How many elements it contains.
• Attribute function – Extra arbitrary metadata.

Atomic Vectors in R are:-


1) Logical- TRUE and FLASE
2) Numeric-2.5,56.23
3) Integer-5L,256L
4) Complex-2+3i
5) Character-”HELLO”
1) Logical 2) Numeric 3)Integer 4)
complex
v<-TRUE x<-23.5 y<-258L z<-
3+2i
class(v) class(x) class(y)
class(z)
5) Character
Name<-”Rahul”
class(Name)
Vector Creation
The c() function is used for creating a vector in R. This function returns
a one-dimensional array
1) Single Element Vector
when we write just one value in R, it becomes a vector of length 1
and belongs to one of the below vector types.
Example:
# Atomic vector of type character.
print("abc")
# Atomic vector of type double.
print(12.5)
# Atomic vector of type integer.
print(63L)
# Atomic vector of type logical.
print(TRUE)
# Atomic vector of type complex.
print(2+3i)
2) Multiple Elements Vector
a) Using colon operator with numeric data
# Creating a sequence from 5 to 13.
v <- 5:13
print(v)
# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)
# If the final element specified does not belong to the sequence then it is
discarded.
v <- 3.8:11.4
print(v)
# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5, 9, by = 0.4))
# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)
Multiple Elements Vector
b) Using sequence (Seq.) operator
Syntax
seq()
seq(from,to,by=((to-from)/length out-
1)),legth.out=NULL,along.with+NULL)
Example:
print(seq(10))
Print(seq(1:15))
#sequence with fractional increment
print(seq(from=5, to=9, by = 0.4))
#sequence with default increment
print(seq(20,25))
#sequence with decrement
print(seq(10,0,by=-2))
#sequence with predefined length with default increment
print(Seq(from=10,length=10))
#sequence with predefined length with default increment
print(seq(from=10,length=10,by=0.1))
#sequence with predefined length with constant decrement
print(seq(from=0,length=10,by=-2))
#sequence with predefined variable with constant increment
x<-2
print(seq(1,x,x/10))
#sequence with predefined variable with constant decrement
x<-50
print(seq(1,x,x/10))
c) Using the c() function
Example:
print(c('apple','red',5,TRUE) )
print(c(80,23,58,56))
How to Access Elements of R Vectors?
With the help of vector indexing, we can access the elements of
vectors. Indexing denotes the position where the values in a vector
are stored. This indexing can be performed with the help of integer,
character or logic.
1. Indexing with Integer Vector
The indexing of vectors in R starts with 1
Example:
#Assignment of an index vector
x<-c(9,8,7,6)
ind<-seq(along=x)
ind
#Accessing an element of index vector
x[ind[2]]
2. Indexing with Character Vector
Character vector indexing can be done as follows:
x <- c("One" = 1, "Two" = 2, "Three" = 3)
x["Two"]
3. Indexing with Logic Vector
In logical indexing, the positions whose corresponding position
has logical vector TRUE are returned. For example, in the
below code, R returns the positions of 1 and 3, where the
corresponding logical vectors are TRUE.
a <- c(1,2,3,4)
a[c(TRUE, FALSE, TRUE, FALSE)]
Example1
# Accessing vector elements using position.
x <- c("Jan","Feb","Mar","April","May","June","July")
y <- x[c(2,3,6)]
print(y)

Example2
# Accessing vector elements using logical indexing.
v <-
x[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALS
E)]
print(v)

Example3
# Accessing vector elements using negative indexing.
t <- x[c(-2,-5)]
print(t)
Generating sequence of date:
Syntax
seq(from,to,by,length.out=NULL,along.with=NULL)
from: starting date (required)
to: end date(optional)
by: increment of sequence(“day” or ”week” or” month” or ”quarter” or
”year”)
length. out : integer (optional) is desired length of the sequence
along. with : take the length from the length of this argument
Example:
#sequence of date by years
seq(as.Date(“2010-01-01”),as.Date(“2015-01-01”),by=“years”)
#sequence of date by days with predefined length
Seq(as.Date(“2019-01-01”),by=“days”,length=8)
#sequence with defining start and end dates
startdate<-as.Date(“2019-01-01”)
enddate<-as.Date(“2020-01-01”)
out<-seq(enddate,startdate,by-”-1 month”)
rev(out[out>startdate & out<enddate])
Generating sequences of alphabets:
1)Sequence of lowercase alphabets. Function used is letters
2)Sequence of lowercase alphabets from a particular index specified.
Function used is letters[from_index:to_index]
Example:
letters
letters[1:3]
letters[3:1]
letters[21:23]
letters[8]
NOTE:By using LETTERS,sequence of upper case alphabets are
genrated.
REPEATS
Command rep() is used to replicate the values in vector.
Syntax
rep(x) : replicates the values in a vector.
rep(x,times=n) : repeat x as a whole n times.
rep(x,each=n) : repeat each cell of x as n times.
Example:
1)rep(1:4,times=5)
7)x<-matrix(nrow=2,ncol=2,data=1:4,byrow=T)
2)rep(1:4, 5) x
3)rep(1:4,each=5) rep(x,3)
4)rep(1:4,each=5,times=5) rep(x,each=2,times=3)
5)rep(1:4,2:5) 8) rep(c(“a”,”b”,”c”),3)
6)Ans<-seq(2,8,2) 9) rep(c(“A”,”B”,”C”),each=4)
Ans
rep(1:4,Ans)
SORTING and ORDERING
Sorting:
Sort function sorts the values of a vector in ascending order
(by default) or in descending order .
Syntax
sort(x,decreasing=FALSE or TRUE,na.last=NA,…)
x: vector of values to be sorted
decreasing: should the sort by increasing or decreasing.
na.last: controls the treatment of Nas
if TRUE , puts missing values in the last.
if FLASE, puts missing values in the first.
if NA , removes the missing values.
Example:
Y<-c(10,12,9,12,20,7,15)
sort(Y)
sort(Y, decreasing=TRUE)
x<-c(14,12,27,10,9,12,18,NA,21,NA,16,16,NA)
sort(x,na.last=TRUE)
sort(x, decreasing=TRUE ,na.last=FALSE)
sort(x, decreasing=TRUE ,na.last=NA)
ordering:
Order function sorts a variable according to the
order of variable.
Syntax
order(x,decreasing=FALSE or
TRUE,na.last=NA,…)
x: vector of values to be sorted
decreasing: should the sort by increasing or
decreasing.
na.last: controls the treatment of Nas
if TRUE , puts missing values in the last.
if FLASE, puts missing values in the first.
if NA , removes the missing values.
Example:
Y<-c(10,12,9,12,20,7,15)
order(Y)
order(Y, decreasing=TRUE)
x<-c(14,12,27,10,9,12,18,NA,21,NA,16,16,N
A)
order(x,na.last=TRUE)
order(x, decreasing=TRUE ,na.last=FALSE)
order(x, decreasing=TRUE ,na.last=NA)
Accessing Vector Elements
Elements of a Vector are accessed using indexing. The [ ]
brackets are used for indexing. Indexing starts with position
1. Giving a negative value in the index drops that element
from result.TRUE, FALSE or 0 and 1 can also be used for
indexing.
Example:
# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)] print(x)
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)
Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or
divided giving the result as a vector output.
Example:
# Create two vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)
# Vector addition.
add.result <- v1+v2
print(add.result)
# Vector subtraction.
sub.result <- v1-v2
print(sub.result)
# Vector multiplication.
multi.result <- v1*v2
print(multi.result)
# Vector division.
divi.result <- v1/v2
print(divi.result)
Operations in R Vector
1. Combining Vector in R
In order to combine the two vectors in R, we will create two
new vectors ‘n’ and ‘s’. Then, we will create another vector
that will combine these two using c(n,s)
Example:
n = c(1, 2, 3, 4)
s = c("Hadoop", "Spark", "HIVE", "Flink")
c(n,s)
2. Arithmetic Operations on Vectors in R
Arithmetic operations on vectors can be
performed member-by-member.
Example:
a = c (1, 3)
b = c (1, 3)
a+b
a-b
Vector Element Recycling
we apply arithmetic operations to two vectors of
unequal length, then the elements of the shorter
vector are recycled to complete the operations.
Example:
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
add.result <- v1+v2
print(add.result)
sub.result <- v1-v2
print(sub.result)
3. Logical Index Vector in R
By using a logical index vector in R, we can form a new vector from a given
vector, which has the same length as the original vector. If the
corresponding members of the original vector are included in the slice, then
vector members are TRUE and otherwise FALSE.
Example:
S = c("bb", "cc")
L = c(TRUE, TRUE) #Defining our Logical Vector
S[L] #This will return elements of vector S that corrospond to logic vector L
4. Numeric Index
For indexing a numerical value in R, we specify the index between square
braces [ ].
Example:
#Creating our vector
x <- c("aa", "bb", "cc", "dd", "ee")
x[3]
#Using Negative Index
x[-2]
#Using an out-of-range index
x[15]
5. Duplicate Index
The index vector allows duplicate values. Hence, the following retrieves a
member twice in one operation.
Example:
s = c("aa", "bb", "cc", "dd", "ee")
s[c(2,3,3)]
6. Range Indexes
To produce a vector slice between two indexes, we can use the colon
operator “:“. It is convenient for situations involving large vectors.
Example:
s = c("aa", "bb", "cc", "dd", "ee")
s[1:3]
7. Out-of-order Indexes
The index vector can even be out-of-order. Here is a vector slice with the
order of first and second members reversed.
Example:
s = c("aa", "bb", "cc", "dd", "ee")
s [ c (2, 1, 3) ]
8. Named Vectors Members
#We first create our vector of characters:
v = c("Hadoop", "Spark")
v
# we name the first vector member as “First” and the second member
as “Second”.
names(v) = c("First", "Second")
v
# We retrieve the first member by its name as follows:
v["First"]
# We can also reverse the order using the character string index vector:
v[c("Second", "First")]
Vector Element Sorting
Elements in a vector can be sorted using the sort() function.
Example:
v <- c(3,8,4,5,0,11, -9, 304)
# Sort the elements of the vector.
sort.result <- sort(v)
print(sort.result)
# Sort the elements in the reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
# Sorting character vectors. v <- c("Red","Blue","yellow","violet")
sort.result <- sort(v)
print(sort.result)
# Sorting character vectors in reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)
LIST
What is R List?
Lists are the R objects which contain elements of different
types like − numbers, strings, vectors and another list
inside it. A list can also contain a matrix or a function as
its elements.
List is created using list() function.
• Lists can contain elements of different types so that the list
elements may have different modes
• Lists can even contain other structured objects , such as lists
and data frame which allows to create recursive data
structures
• Lists can be indexed by position . For example x[[5]] refers to
the fifth element of list x
• Lists can extract sublist .For example x[c(2,5)] is a sublist of x
that consists of the second and fifth elements.
• Difference between a vector and a list
*In a vector all elements are of same mode.
**In a list element may be of different modes.
Creating a List
Example:
#1
vec <- c(1,2,3)
char_vec <- c("Hadoop", "Spark", "Flink", "Mahout")
logic_vec <- c(TRUE, FALSE, TRUE, FALSE)
out_list <- list(vec, char_vec, logic_vec)
out_list
#2
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23,
119.1)
print(list_data)
#3
list_data <- list("Red", "White", c(1,2,3), TRUE, 22.4)
print(list_data)
Naming List Elements
The list elements can be given names and they can be
accessed using these names.
Example:
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
print(list_data)
Accessing List Elements
Elements of the list can be accessed by the index of the element in
the list.
1)Access the element of the a list using the operator[[]]
2)In case of named lists it can also be accessed using the names.
Example:
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
List_data
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
List_data
# Access the first element of the list.
print(list_data[1])
# Access the third element from first element of the list (mar).
print(list_data[1][3])
# Access the third element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)
Manipulating List Elements
We can add, delete and update list elements as shown below.
We can add and delete elements only at the end of a list. But
we can update any element.
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
#Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
print(list_data)
# Add element at the end of the list.
list_data[4] <- 58.2
print(list_data)
# Remove the last element.
list_data[4] <- NULL
print(list_data)
# Update the 3rd Element.
list_data[2] <- TRUE
print(list_data)
Merging Lists
You can merge many lists into one list by
placing all the lists inside one list() function.
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
print(list1)
print(list2)
# Merge the two lists.
merged.list <- c(list1,list2)
# Print the merged list.
print(merged.list)
Converting List to Vector
A list can be converted to a vector so that the elements of the vector can be
used for further manipulation. All the arithmetic operations on vectors can be
applied after the list is converted into vectors.
To do this conversion, we use the unlist() function. It takes the list as input
and produces a vector.
# Create lists.
list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
# adding the lists
list1+list2
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
print(v1)
print(v2)
# Now add the vectors
result <- v1+v2
print(result)
R Predefined Lists
Lists for letters and month names are predefined
letters: lowercase alphabets
LETTERS: lowercase alphabets
month.abb: months by abbreviation
month.name: months by full name
c function in R
The c function in R combines the parameter into a list
and converts them to the same type.
c("April", 4)
typeof("4")
MATRIX
What is R Matrix?
Matrices are the R objects in which the elements are arranged in a
two-dimensional rectangular layout. They contain elements of the
same atomic types. Though we can create a matrix containing only
characters or only logical values, they are not of much use. We use
matrices containing numeric elements to be used in mathematical
calculations.
Creating matrix
A Matrix is created using the matrix() function.
Syntax
The basic syntax for creating a matrix in R is −
matrix(data, nrow, ncol, byrow, dimnames)
Following is the description of the parameters used −
data is the input vector which becomes the data elements of the
matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are
arranged by row.
dimname is the names assigned to the rows and columns.
Example:
# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, dimnames =
list(rownames, colnames))
print(P)
Accessing Elements of a Matrix
Elements of a matrix can be accessed by using the column
and row index of the element.
Example:
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames,
colnames))
print(P)
# Access the element at 3rd column and 1st row.
print(P[1,3])
# Access the element at 2nd column and 4th row.
print(P[4,2])
# Access only the 2nd row.
print(P[2,]) # Access only the 3rd column.
print(P[,3])
# Access submartix.
print(P[1:2,2:3])
Modify Matrix in R
In order to modify our matrix mat in R, there are several methods:
1) Assign a Single Element
The first method is to assign a single element to the position of the
matrix that will modify the original value.
Syntax
mat[n,m] <- y
Where
n and m are the rows and columns of the element respectively.
y is the value that we assign to modify our matrix.
Example:
#creating matrix
matrix<-matrix(nrow=3,ncol=3,data=c(1:9))
matrix
# modify the matrix
matrix[1,2]<-10
matrix
2) Use of Relation Operators
Another method of modifying is with the use of relational operators
like >, <, ==.
Example:
# modify the matrix
matrix[matrix>4]<-10
matrix
3) Addition of Rows and Columns
Another method of modifying an R matrix is through the addition of
rows and columns using the rbind() and cbind() function
respectively.
# creating a matrix
new_mat = matrix(1:12, nrow = 3, ncol = 3)
new_mat
# we will add a column to our matrix ‘new_mat’ using cbind() function
cbind(new_mat, c(1,2,3))
new_mat
# We can also add a row using the rbind() function
rbind(new_mat, c(-1,-2,-3))
new_mat
Matrix Computations
Various mathematical operations are performed on the matrices using the R
operators. The result of the operation is also a matrix.
Note: The dimensions (number of rows and columns) should be same for the matrices
involved in the operation.
Addition , Subtraction, Multiplication & Division
Example:
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Add the matrices.
result <- matrix1 + matrix2
print(result)
# Subtract the matrices
result <- matrix1 - matrix2
print(result)
# Multiply the matrices.
result <- matrix1 * matrix2
print(result)
# Divide the matrices result <- matrix1 / matrix2
print(result)
Scalar multiplication of matrices
Example:
A<-matrix(nrow=3,ncol=2,data=c(5,-2,0,8,5,3))
A
B<- matrix(nrow=3,ncol=2,data=c(0,5,10,-8,9,2))
B
C<-4*A
C
D<-(-5)*B
D
C+D
Product of two matrices
Note: Product of two matrices is possible only when
number of columns of 1st matrix must be number of rows
of 2nd matrix.
Example:
X<-matrix(nrow=3,ncol=3,1:9)
X
Y<- matrix(nrow=3,ncol=3,seq(1,18,by=2),byrow=TRUE)
Y
X%*%Y
Y%*%X
Transpose of a matrix
Example:
A<-matrix(-1:-25,6,4)
A
B<-t(A)
B
Inverse of a matrix
Example: 1)
A<-matrix(seq(9,-3,by=-2),4,4)
A
B<-solve(A)
B
2)
A<-matrix(2,2,data=c(5,2,-7,9),byrow=TRUE)
A
B<-solve(A)
B
Creating matrix with equal elements
Example:
X<-matrix(4,nrow=5,ncol=4)
X
Creating diagonal matrix
Example:
X<-diag(4,nrow=5,ncol=5)
X
nrow(X)
ncol(X)
dim(X)
Matrices created from list:
Martices is created from list by using
function dim()
Example:
ad<-list(1,2,3,”x”,”y”,”z”)
print (ab)
dim(ab)<-c(2,3)
print (ab)
ARRAY
What is R Array?
* Arrays are the R data objects which can store data in more
than two dimensions.
For example − If we create an array of dimension (2, 3, 4)
then it creates 4 rectangular matrices each with 2 rows and
3 columns.
** Arrays can store only data type.
*** An array is created using the array() function. It takes
vectors as input and uses the values in the dim parameter to
create an array.
Syntax
array_NAME <- array(data, dim = (row_Size, column_Size,
matrices, dimnames)
data – Data is an input vector that is given to the array.
matrices – Array in R consists of multi-dimensional
matrices.
row_Size – row_Size describes the number of row elements
that an array can store.
column_Size – Number of column elements that can be
stored in an array.
dimnames – Used to change the default names of rows and
columns to the user’s preference.
Arguments in Array
The array function in R can be written as:
array(data = NA, dim = length(data), dimname =
NULL)
• data is a vector that provides data to fill the array.
• dim attribute provides maximum indices in each
dimension
• dimname can be either NULL or can have a name for
the array.
Creating Array in R
Example:
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)
Naming Columns and Rows
We can give names to the rows, columns and
matrices in the array by using
the dimnames parameter.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =
list(row.names,column.names, matrix.names))
print(result)
Accessing Array Elements
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames =
list(row.names, column.names, matrix.names))
print(result)
# Print the third row of the second matrix of the array.
print(result[3,,2])
# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])
# Print the 2nd Matrix.
print(result[,,2])
Manipulating Array Elements
As array is made up matrices in multiple dimensions, the operations on elements of array
are carried out by accessing elements of the matrices.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))
print(array1)
# Create two vectors of different lengths.
vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim = c(3,3,2))
print(array2)
# create matrices from these arrays.
matrix1 <- array1[,,1]
print(matrix1)
matrix2 <- array2[,,2]
print(matrix2)
# Add the matrices.
result <- matrix1+matrix2
print(result)
Calculations Across Array Elements
We can do calculations across the elements in an array using
the apply() function.
Syntax
apply(x, margin, fun)
Following is the description of the parameters used −
• x is an array.
• margin is the name of the data set used.
• fun is the function to be applied across the elements of the array.
Example:
We use the apply() function below to calculate the sum of the elements in the
rows of an array across all the matrices.
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)
# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)
# Use apply to calculate the product of the row across all the matrices.
result <- apply(new.array, c(1), prod)
print(result)
# Use apply to calculate the mean of the row across all the matrices.
result <- apply(new.array, c(1), mean)
print(result)
Factors
What is R Factor?
* Factors are the data objects which are used to
categorize the data and store it as levels.
* They can store both strings and integers.
*They are useful in the columns which have a
limited number of unique values. Like "Male,
"Female" and True, False etc.
*They are useful in data analysis for statistical
modelling.
Attributes of a Factor
Some important attributes of the factor that we
will use in this article are:
• x: The input vector that is to be transformed into a
vector.
• levels: This is an optional vector that represents a
set of unique values that are taken by x.
• labels: It is a character vector that corresponds to
the number of labels.
• Exclude: With this attribute, we specify the values
to be excluded.
• ordered: This is a logical attribute that determines
if the levels should be ordered.
• nmax: This attribute specifies the upper bound for
the maximum number of levels.
Creating an R Factor
Factors are created using the factor () function by
taking a vector as input.
The nlevels functions gives the count of levels.
Example:
# Create a vector as input.
data <-
c("East","West","East","North","North","East","West","West","
West","East","North")
print(data)
print(is.factor(data))
# Apply the factor function.
factor_data <- factor(data)
print(factor_data)
print(is.factor(factor_data))
Example:
# creating vector
directions <- c("North", "North", "West", "South")
#converting this vector into a factor using the factor() function:
factor(directions)
# adding this missing level to our factors, we use the “levels”
factor(directions, levels= c("North", "East", "South", "West"))
# provides abbreviations or ‘labels’ to our levels, we make use
of the labels argument as follows
factor(directions, levels= c("North", "East", "South", "West"),
labels=c("N", "E", "S", "W"))
# exclude any level from your factor, you can make use of the
exclude argument
factor(directions, levels= c("North", "East", "South", "West"),
exclude = "North")
Generate Factor Level in R
In order to generate factor levels in R, we make use of
the gl() function.
OR
The order of the levels in a factor can be changed by applying the
factor function again with new order of the levels.
Syntax
gl(n, k, labels)
where
n is an integer specifying the number of levels.
k is an integer that gives out a number of replications.
labels are simply the vector of labels for our factor.
Example:
# Example 1
BigData <- gl(3, 2, labels = c("Hadoop", "Spark","Flink"))
print(BigData)
# Example 2
v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)
Accessing Components of Factor in R
# creating factor
compass <- c("East","West","East","North")
data <- factor(compass)
data
#Accessing the 4th element
data[4]
#Accessing the 2nd & 3rd element
data[c(2,3)]
#Accessing everything except 1st element
data[-1]
#Accessing using Logical Vector
data[c(TRUE, FALSE, TRUE, TRUE)]
Modify an R Factor
To modify a factor, we are only limited to the values that are not
outside the predefined levels.
# creating factor
compass <- c("East","West","East","North")
data <- factor(compass)
print(data)
#Modifying 2nd element
data[2] <- "North"
#Cannot Modify Factor with an Element outside its scope
data[3] <- "South"
Factor Functions in R
some key functions that are related to factor are
is.factor() : checks if the input is present in the form of factor
and returns a Boolean value (TRUE or FALSE).
as.factor(): takes the input (usually a vector) and converts it
into a factor.
is.ordered(): checks if the factor is ordered and returns
boolean TRUE or FALSE.
as.ordered() function takes an unordered function and returns
a factor that is arranged in order.
Example:
directions <- c("North", "North", "West", "South")
f_directions <- factor(directions)
is.factor(f_directions)
as.factor(directions)
is.ordered(f_directions)
as.ordered(f_directions)
DATA FRAME
What is Data Frame in R?
* A data frame is a table or a two-dimensional array-like
structure in which each column contains values of one
variable and each row contains one set of values from
each column.
* The tabular data is referred by the data frames. In
particular, it is a data structure in R that represents cases
in which there are a number of observations(rows) or
measurements (columns).

Following are the characteristics of a data frame.


• The column names should be non-empty.
• The row names should be unique.
• The data stored in a data frame can be of numeric, factor
or character type.
• Each column should contain same number of data items.
R Data Frame Operations
1. Create Data Frame
#Example 1
employee_data <- data.frame(
employee_id = c (1:5),
employee_name =
c("James","Harry","Shinji","Jim","Oliver"),
sal = c(642.3,535.2,681.0,739.0,925.26),
join_date = as.Date(c("2013-02-04", "2017-06-21",
"2012-11-14", "2018-05-19","2016-03-25")),
stringsAsFactors = FALSE)
Example:
BMI <- data.frame(name=c(‘Ramesh’,’Nitin’,’Seema’), gender =
c("Male", "Male","Female"), height = c(152, 171.5, 165),
weight = c(81,93, 78), Age = c(42,38,26) )
BMI
View(BMI)
OR
name=c(‘Ramesh’,’Nitin’,’Seema’)
gender = c("Male", "Male","Female")
height = c(152, 171.5, 165)
weight = c(81,93, 78)
Age = c(42,38,26)
BMI<-data.frame(name,gender,height,weight,Age)
View(BMI)
2. Get the Structure of the R Data Frame
The structure of the data frame can see by using the star
() function.
str(employee_data)
3. Extract data from Data Frame
By using the name of the column, extract a specific column
from the column.
a) Extract Specific columns:
output <- data.frame(employee_data$employee_name,
employee_data$employee_id)
print(output)
b) Extract first two rows:
output <- employee_data[1:2,]
c) Extract 1st and 2nd row with the 3rd and 4th column of the
below data.
result <- employee_data[c(1,2),c(3,4)]
result
4. Expand R Data Frame
A data frame can be expanded by adding
columns and rows.
a) Add Column
Add the column vector using a new column
name.
Example:
# Add the “dept” column
employee_data$dept <-
c("IT","Finance","Operations","HR","Administr
ation")
out <- employee_data
print(out)
b) Add Row
# Create the second R data frame
employee_new_data <- data.frame(
employee_id = c (6:8),
employee_name = c("Aman", "Piyush", "Aakash"),
sal = c(523.0,721.3,622.8),
join_date =
as.Date(c("2015-06-22","2016-04-30","2011-03-17")),
stringsAsFactors = FALSE)
# Bind the two data frames.
employee_out_data <-
rbind(employee_data,employee_new_data)
employee_out_data #DataFlair
Factors in Data Frame
On creating any data frame with a column of text data, R
treats the text column as categorical data and creates
factors on it.
Example:
# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")
# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)
# Test if the gender column is a factor.
print(is.factor(input_data$gender))
# Print the gender column so see the levels.
print(input_data$gender)
Get the Structure of the Data Frame
The structure of the data frame can be seen by
using str() function.
Example:
# Create the data frame.
emp.data <- data.frame( emp_id = c (1:5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25), start_date =
as.Date(c("2012-01-01", "2013-09-23", "2014-11-15",
"2014-05-11", "2015-03-27")), stringsAsFactors = FALSE
)
print(emp.data)
# Get the structure of the data frame.
str(emp.data)
Summary of Data in Data Frame
The statistical summary and nature of the
data can be obtained by
applying summary() function.
Example:
# Create the data frame. emp.data <-
data.frame( emp_id = c (1:5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01",
"2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")), stringsAsFactors = FALSE )
# Print the summary.
print(summary(emp.data))
Extract Data from Data Frame
Extract specific column from a data frame using column
name.
Example 1:(Extract specific column from a data frame
using column name.)
# Create the data frame. emp.data <- data.frame( emp_id
= c (1:5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25), start_date =
as.Date(c("2012-01-01","2013-09-23","2014-11-15","20
14-05-11", "2015-03-27")), stringsAsFactors = FALSE )
# Extract Specific columns.
result <-
data.frame(emp.data$emp_name,emp.data$salary)
print(result)
Example 2:(Extract the first two rows and then
all columns)
#Create the data frame.
emp.data <- data.frame( emp_id = c (1:5),
emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01",
"2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")), stringsAsFactors = FALSE )
# Extract first two rows.
result <- emp.data[1:2,]
print(result)
Example 3:(Extract 3rd and 5th row with 2nd and
4th column)
# Create the data frame.
emp.data <- data.frame( emp_id = c (1:5),
emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01",
"2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")), stringsAsFactors = FALSE )
# Extract 3rd and 5th row with 2nd and 4th
column.
result <- emp.data[c(3,5),c(2,4)]
print(result)
Expand Data Frame
A data frame can be expanded by adding columns and
rows.
Add Column : Just add the column vector using a new
column name.
Example:
# Create the data frame. emp.data <- data.frame( emp_id =
c (1:5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25), start_date =
as.Date(c("2012-01-01", "2013-09-23", "2014-11-15",
"2014-05-11", "2015-03-27")), stringsAsFactors = FALSE
)
# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)
Add Row : Add more rows permanently to an existing data
frame, we need to bring in the new rows in the same structure
as the existing data frame and use the rbind() function.
Example:
# Create the first data frame.
emp.data <- data.frame( emp_id = c (1:5), emp_name =
c("Rick","Dan","Michelle","Ryan","Gary"), salary =
c(623.3,515.2,611.0,729.0,843.25), start_date = as.Date(c("2012-01-01",
"2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")), dept =
c("IT","Operations","IT","HR","Finance"), stringsAsFactors = FALSE )
print(emp.data)
# Create the second data frame
emp.newdata <- data.frame( emp_id = c (6:8), emp_name =
c("Rasmi","Pranab","Tusar"), salary = c(578.0,722.5,632.8), start_date =
as.Date(c("2013-05-21","2013-07-30","2014-06-17")), dept =
c("IT","Operations","Fianance"), stringsAsFactors = FALSE )
print(emp.newdata)
# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
Control Structures in R
In order to control the execution of the
expressions flow in R, we make use of the
control structures. These control structures
are also called as loops in R.
There are eight types of control structures in
R:
• if
• if-else
• for
• nested loops
• while
• repeat and break
• next
• return
Decision making
Decision making structures require the programmer to
specify one or more conditions to be evaluated or tested
by the program, along with a statement or statements to
be executed if the condition is determined to be true,
and optionally, other statements to be executed if the
condition is determined to be false.
1) if else statement
An if statement can be followed by an
optional else statement which executes when
the boolean expression is false.
Syntax
if(boolean_expression) { // statement(s)
will execute if the boolean expression is
true. } else { // statement(s) will execute if
the boolean expression is false. }
If the Boolean expression evaluates to
be true, then the if block of code will be
executed, otherwise else block of code will
be executed.
1. if Condition in R
This task is carried out only if this condition is returned as
TRUE. R makes it even easier: You can drop the
word then and specify your choice in an if statement.
Syntax
if (test_expression) {
statement
}
Example:
values <- 1:10
if (sample(values,1) <= 10)
print(paste(values, "is less than or equal to 10"))
2a) The if...else Statement
An if statement can be followed by an optional else
if...else statement, which is very useful to test various
conditions using single if...else if statement.
When using if, else if, else statements there are few points
to keep in mind.
• An if can have zero or one else and it must come after
any else if's.
• An if can have zero to many else if's and they must come
before the else.
• Once an else if succeeds, none of the remaining else if's
or else's will be tested.
Syntax
if(boolean_expression 1) { // Executes when the
boolean expression 1 is true. } else if(
boolean_expression 2) { // Executes when the boolean
expression 2 is true. } else if( boolean_expression 3) { //
Executes when the boolean expression 3 is true. } else {
// executes when none of the above condition is true.
Flow chart

Example:
#example of if else statement
# Ex-1
x <- c("what","is","truth")
if("Truth" %in% x) { print("Truth is found") } else { print("Truth is not
found") }
# Ex-2
x <- 30L
if(is.integer(x)) { print("X is an Integer")} else {print("X is not an
Integer") }
# Ex-3
x<-5
if(x<=3){print(x-1)} else {print(2*x)}
Example:
#Ex-4
x <- c("what","is","truth")
if("Truth" %in% x) {
print("Truth is found the first time")
} else if ("truth" %in% x){
print("truth is found the second time")
} else { print("No truth found") }
#Ex-5
#Creating our first variable val1
val1 = 10
#Creating second variable val2
val2 = 5
#Executing Conditional Statement based on the comparison
if (val1 > val2){
print("Value 1 is greater than Value 2")
} else if (val1 < val2){
print("Value 1 is less than Value 2")
}
2b) ifelse statement
Syntax
ifelse(test,yes,no)
Example:
x<-1:10
ifelse(x<4,x+1,x-1)
3) Switch Statement
A switch statement allows a variable to be tested for equality against
a list of values. Each value is called a case, and the variable being
switched on is checked for each case.
Syntax
switch(expression, case1, case2, case3....)
The following rules apply to a switch statement −
• If the value of expression is not a character string it is coerced to
integer.
• You can have any number of case statements within a switch. Each
case is followed by the value to be compared to and a colon.
• If the value of the integer is between 1 and nargs()−1 (The max
number of arguments)then the corresponding element of case
condition is evaluated and the result returned.
• If expression evaluates to a character string then that string is
matched (exactly) to the names of the elements.
• If there is more than one match, the first matching element is
returned.
• No Default argument is available.
• In the case of no match, if there is a unnamed element of ... its value
is returned. (If there is more than one such argument an error is
returned.)
Flow chart

Example:
x <- switch( 2,
"first",
"second",
"third",
"fourth" )
print(x)
LOOPS
There may be a situation when you need
to execute a block of code several number
of times. In general, statements are
executed sequentially. The first statement
in a function is executed first, followed by
the second, and so on.
A loop statement allows us to execute a
statement or group of statements multiple
times
1) repeat and break Statement in R
The Repeat loop executes the same code again and again until a
stop condition is met.
Syntax
repeat { commands if(condition) { break } }

Example:
v <- c("Hello")
cnt <- 2
repeat {
print(v)
cnt <- cnt+1
if(cnt > 5)
{
break }
}
2) While loop
The While loop executes the same code again and
again until a stop condition is met.
Syntax
while (test_expression) { statement }

Example:
v <- c("Hello")
cnt <- 2
while (cnt < 7)
{ print(v)
cnt = cnt + 1
}
3) For loop
A For loop is a repetition control structure that allows you to
efficiently write a loop that needs to execute a specific number of
times.
Syntax
for (value in vector) { statements }

Example:
v <- LETTERS[1:4]
for ( i in v)
{
print(i)
}
3)Next statement
The next statement in R programming
language is useful when we want to skip
the current iteration of a loop without
terminating it. On encountering next, the R
parser skips further evaluation and starts
next iteration of the loop.
Syntax
next
Flow chart

Example:
v <- LETTERS[1:6]
for ( i in v)
{ if (i == "D")
{
Next
}
print(i) }
4) Nested for Loop in R
It is similar to the standard for loop, which makes
it easy to convert for loop to a foreach loop.
Unlike many parallel programming packages for
R, foreach doesn’t require the body of for loop to
be turned into a function. We can call this a
nesting operator because it is used to create
nested foreach loops.
Example:
mat <- matrix(1:10, 2)
for (id1 in seq(nrow(mat))) {
for (id2 in seq(ncol(mat))) {
print(mat[id1, id2])
}
}
5)return Statement in R
Many times, we will require some functions to do processing
and return back the result. This is accomplished with
the return() statement in R.
Syntax:
return(expression)
Example:
check <- function(x) {
if (x > 0) {
result <- "Positive"
} else if (x < 0) {
result <- "Negative"
} else {
result <- "Zero"
}
return(result)
}
FUNCTION
A function is a set of statements organized together to
perform a specific task. R has a large number of in-built
functions and the user can create their own functions.
An R function is created by using the keyword function.
syntax
function_name <- function(arg_1, arg_2, ...) { Function body
}
The different parts of a function are −
• Function Name − This is the actual name of the function. It is
stored in R environment as an object with this name.
• Arguments − An argument is a placeholder. When a function
is invoked, you pass a value to the argument. Arguments are
optional; that is, a function may contain no arguments. Also
arguments can have default values.
• Function Body − The function body contains a collection of
statements that defines what the function does.
• Return Value − The return value of a function is the last
expression in the function body to be evaluated.
1) Built-in Function
R has many in-built functions which can be directly called
in the program without defining them first.
in-built functions are:
seq(), mean(), max(), sum(x) ,sqrt() and paste(...) etc.
Example:
print(seq(1:10))
print(sum(1:10))
print(mean(1:10))
print(var(1:10))
print(max(1:10))
print(min(1:10))
print(range(1:10))
print(sqrt(1:10))
2)User-defined Function
We can create user-defined functions in R. They are
specific to what a user wants and once created they
can be used like the built-in functions. Below is an
example of how a function is created and used.
Example: 1)
# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
}
# Call the function new.function supplying 6 as an argument.
new.function(6)
Example:
2)
#function to find factorial n
fact<-function(n){
ifelse(((n==0)||(n==1)),1,prod(1:n))
}
#calling function factorial n to find combination
n=5
x=5
d=n-x
comb=fact(n)/(fact(x)*fact(d))
print(comb)
R packages
R packages are a collection of R
functions, complied code and sample data.
They are stored under a directory
called "library" in the R environment.
• R installs a set of packages during
installation.
• More packages are added later, when
they are needed for some specific purpose
• All the packages available in R language
are listed at R Packages.
Check Available R Packages
Get library locations containing R packages by
.libPaths()
Get the list of all the packages installed
library()
Get all packages currently loaded in the R
environment
search()
Install a New Package
There are two ways to add new R packages.
1) Installing directly from the CRAN directory:
Install directly from CRAN
The following command gets the packages
directly from CRAN webpage and installs the
package in the R environment.
install.packages("Package Name")
Example:
# Install the package named "XML".
install.packages("XML")
2) Install package manually:
• Go to the link R Packages to download the
package needed.
• Save the package as a .zip file in a suitable
location in the local system.
• The following command to install this package in
the R environment.
install.packages(file_name_with_path, repos
= NULL, type = "source")
Example:
# Install the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos =
NULL, type = "source")
Load Package to Library
Before a package can be used in the code, it
must be loaded to the current R environment.
You also need to load a package that is already
installed previously but not available in the
current environment.
A package is loaded using the following
command −
library("package Name", lib.loc = "path to
library")
Example:
# Load the package named "XML"
library("E:/XML_3.98-1.3.zip", repos = NULL,
type = "source")
Data Reshaping
* Data Reshaping in R is about changing
the way data is organized into rows and
columns. Most of the time data processing
in R is done by taking the input data as a
data frame.
* It is easy to extract data from the rows
and columns of a data frame but there are
situations when we need the data frame in
a format that is different from format in
which we received it.
*R has many functions to split, merge and
change the rows to columns and
vice-versa in a data frame.
1) Joining Columns and Rows in a Data Frame
a) We can join multiple vectors to create a data frame using the cbind() function.
b) we can merge two data frames using rbind() function
Example:
# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)
# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)
# Print a header.
cat("# # # # The First data frame\n")
# Print the data frame.
print(addresses)
# Create another data frame with similar columns
new.address <- data.frame( city = c("Lowry","Charlotte"), state = c("CO","FL"), zipcode = c("80230","33949"),
stringsAsFactors = FALSE )
# Print a header. cat("# # # The Second data frame\n")
# Print the data frame.
print(new.address)
# Combine rows form both the data frames.
all.addresses <- rbind(addresses,new.address)
# Print a header.
cat("# # # The combined data frame\n")
# Print the result.
print(all.addresses)
Merging Data Frames
We can merge two data frames by using
the merge() function.
The data frames must have same column names on
which the merging happens.
Example: we consider the data sets about Diabetes in
Pima Indian Women available in the library names
"MASS". we merge the two data sets based on the
values of blood pressure("bp") and body mass
index("bmi"). On choosing these two columns for
merging, the records where values of these two
variables match in both data sets are combined
together to form a single data frame.

library(MASS)
merged.Pima <- merge(x = Pima.te, y = Pima.tr, by.x
= c("bp", "bmi"), by.y = c("bp", "bmi") )
print(merged.Pima) nrow(merged.Pima)
Melting and Casting
One of the most interesting aspects of R programming is about changing
the shape of the data in multiple steps to get a desired shape. The
functions used to do this are called melt() and cast().
Example:
library(MASS)
print(ships)
Melt the Data
Now we melt the data to organize it, converting all columns other than
type and year into multiple rows. It is done using the melt() function.
molten.ships <- melt(ships, id = c("type","year"))
print(molten.ships)
Cast the Molten Data
We can cast the molten data into a new form where the
aggregate of each type of ship for each year is created. It is
done using the cast() function.
recasted.ship <- cast(molten.ships,
type+year~variable,sum)
print(recasted.ship)
READING AND WRITING DATA
Functions for Reading Data
There are a few very useful functions for reading
data into R.
• read.table() and read.csv() are two popular
functions used for reading tabular data into R.
• readLines() is used for reading lines from a text
file.
• source() is a very useful function for reading in R
code files from a another R program.
• dget() function is also used for reading in R code
files.
• load() function is used for reading in saved
workspaces
• unserialize() function is used for reading single R
objects in binary format.
Functions for Writing Data to Files:
There are similar functions for writing data to files
• write.table() is used for writing tabular data to text
files (i.e. CSV).
• writeLines() function is useful for writing
character data line-by-line to a file or connection.
• dump() is a function for dumping a textual
representation of multiple R objects.
• dput() function is used for outputting a textual
representation of an R object.
• save() is useful for saving an arbitrary number of
R objects in binary format to a file.
• serialize() is used for converting an R object into
a binary format for outputting to a connection (or
file).

You might also like