0% found this document useful (0 votes)
159 views39 pages

1) Open Source: R Advantages

R has several advantages for data science and statistical analysis: 1) It is open source software that is continuously improved by its community of users and developers. 2) It can run on multiple platforms (Windows, Linux, Mac) and is not dependent on any single operating system. 3) It has powerful functionality for machine learning and statistical analysis out of the box via its packages. 4) It excels at data wrangling and manipulation through packages like dplyr and readr.

Uploaded by

vineetkushwah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views39 pages

1) Open Source: R Advantages

R has several advantages for data science and statistical analysis: 1) It is open source software that is continuously improved by its community of users and developers. 2) It can run on multiple platforms (Windows, Linux, Mac) and is not dependent on any single operating system. 3) It has powerful functionality for machine learning and statistical analysis out of the box via its packages. 4) It excels at data wrangling and manipulation through packages like dplyr and readr.

Uploaded by

vineetkushwah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

R ADVANTAGES

1) Open Source
An open-source language is a language on which we can work without any need for a license or a fee.
R is an open-source language. We can contribute to the development of R by optimizing our packages,
developing new ones, and resolving issues.

2) Platform Independent
R is a platform-independent language or cross-platform programming language which means its code
can run on all operating systems. R enables programmers to develop software for several competing
platforms by writing a program only once. R can run quite easily on Windows, Linux, and Mac.

3) Machine Learning Operations


R allows us to do various machine learning operations such as classification and regression. For this
purpose, R provides various packages and features for developing the artificial neural network. R is
used by the best data scientists in the world.

4) Exemplary support for data wrangling


R allows us to perform data wrangling. R provides packages such as dplyr, readr which are capable of
transforming messy data into a structured form.

5) Quality plotting and graphing


R simplifies quality plotting and graphing. R libraries such as ggplot2 and plotly advocates for visually
appealing and aesthetic graphs which set R apart from other programming languages.

6) The array of packages


R has a rich set of packages. R has over 10,000 packages in the CRAN repository which are constantly
growing. R provides packages for data science and machine learning operations.

7) Statistics
R is mainly known as the language of statistics. It is the main reason why R is predominant than other
programming languages for the development of statistical tools.

8) Continuously Growing
R is a constantly evolving programming language. Constantly evolving means when something
evolves, it changes or develops over time, like our taste in music and clothes, which evolve as we get
older. R is a state of the art which provides updates whenever any new feature is added.
Cons
1) Data Handling
In R, objects are stored in physical memory. It is in contrast with other programming languages like
Python. R utilizes more memory as compared to Python. It requires the entire data in one single place
which is in the memory. It is not an ideal option when we deal with Big Data.

2) Basic Security
R lacks basic security. It is an essential part of most programming languages such as Python. Because
of this, there are many restrictions with R as it cannot be embedded in a web-application.

3) Complicated Language
R is a very complicated language, and it has a steep learning curve. The people who don't have prior
knowledge or programming experience may find it difficult to learn R.

4) Weak Origin
The main disadvantage of R is, it does not have support for dynamic or 3D graphics. The reason
behind this is its origin. It shares its origin with a much older programming language "S."

5) Lesser Speed
R programming language is much slower than other programming languages such as MATLAB and
Python. In comparison to other programming language, R packages are much slower.

In R, algorithms are spread across different packages. The programmers who have no prior knowledge
of packages may find it difficult to implement algorithms.

Comments
In R programming, comments are the programmer readable explanation in the source code of an R
program. The purpose of adding these comments is to make the source code easier to understand.
These comments are generally ignored by compilers and interpreters.

In R programming there is only single-line comment. R doesn't support multi-line comment. But if we
want to perform multi-line comments, then we can add our code in a false block.

Single-line comment

#My First program in R programming  
string <-"Hello World!"  
print(string)  

The trick for multi-line comment


#Trick for multi-line comment  
if(FALSE) {  
   "R is an interpreted computer programming language which was created by  
   Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand "  
}  
#My First program in R programming  
string <-"Hello World!"  
print(string)  

DATA TYPES

There are the following data types which are used in R programming:

Data type Example Description

Logical True, False It is a special data type for data with only two possible values
which can be construed as true/false.

Numeric 12,32,112,5432 Decimal value is called numeric in R, and it is the default


computational data type.

Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,

Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.

Characte 'a', '"good'", In R programming, a character is used to represent string values.


r "TRUE", '35.4' We convert objects into character values with the help
ofas.character() function.

Raw A raw data type is used to holds raw bytes.

Example:

variable_logical<- TRUE
cat(variable_logical,"\n")
cat("The data type of variable_logical is ",class(variable_logical),"\n\n")

#Numeric Data type


variable_numeric<- 3532
cat(variable_numeric,"\n")
cat("The data type of variable_numeric is ",class(variable_numeric),"\n\n")

#Integer Data type


variable_integer<- 133L
cat(variable_integer,"\n")
cat("The data type of variable_integer is ",class(variable_integer),"\n\n")

#Complex Data type


variable_complex<- 3+2i
cat(variable_complex,"\n")
cat("The data type of variable_complex is ",class(variable_complex),"\n\n")

#Character Data type


variable_char<- "Learning r programming"
cat(variable_char,"\n")
cat("The data type of variable_char is ",class(variable_char),"\n\n")

#Raw Data type


variable_raw<- charToRaw("Learning r programming")
cat(variable_raw,"\n")
cat("The data type of variable_char is ",class(variable_raw),"\n\n")

R DATA STRUCTURES

R has many data structures, which include:


1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame

Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A vector
is a collection of elements which is most commonly of mode character, integer, logical or
numeric" A vector can be one of the following two types:

1. Atomic vector
2. Lists

List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single mode. A
list contains a mixture of data types. The list is also known as generic vectors because the element of
the list can be of any type of R object. "A list is a special type of vector in which each element
can be a different type."

We can create a list with the help of list() or as.list(). We can use vector() to create a required length
empty list.

Arrays
There is another type of data objects which can store data in more than two dimensions known as
arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four rectangular
matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an input
and uses the value in the dim parameter to create an array.

Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional rectangular layout. In
the matrix, elements of the same atomic types are contained. For mathematical calculation, this can
use a matrix containing the numeric element. A matrix is created with the help of the matrix()
function in R.

Syntax

The basic syntax of creating a matrix is as follows:

1. matrix(data, no_row, no_col, by_row, dim_name)  

Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which each
column contains the value of one variable, and row contains the set of value from each column.

There are the following characteristics of a data frame:

1. The column name will be non-empty.


2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.

Factors
Factors are also data objects that are used to categorize the data and store it as levels. Factors can
store both strings and integers. Columns have a limited number of unique values so that factors are
very useful in columns. It is very useful in data analysis for statistical modeling.

Factors are created with the help of factor() function by taking a vector as an input parameter.

VARIABLES
Variables are used to store the information to be manipulated and referenced in the R program. The R
variable can store an atomic vector, a group of atomic vectors, or a combination of many R objects.

Language like C++ is statically typed, but R is a dynamically typed, means it check the type of data
type when the statement is run. A valid variable name contains letter, numbers, dot and underlines
characters. A variable name should start with a letter or the dot not followed by a number.

Name of variable Validit Reason for valid and invalid


y
_var_name Invalid Variable name can't start with an underscore(_).

var_name, Valid Variable can start with a dot, but dot should not be followed by a number.
var.name In this case, the variable will be invalid.

var_name% Invalid In R, we can't use any special character in the variable name except dot
and underscore.

2var_name Invalid Variable name cant starts with a numeric digit.

.2var_name Invalid A variable name cannot start with a dot which is followed by a digit.

var_name2 Valid The variable contains letter, number and underscore and starts with a
letter.

Assignment of variable
There are three operators which we can use to assign the values to the variable. We can use leftward,
rightward, and equal_to operator for this purpose.

Example:

# Assignment using equal operator.


variable.1 = 124

# Assignment using leftward operator.


variable.2 <- "R Programming"

# Assignment using rightward operator.


133L -> variable.3

print(variable.1)
cat ("variable.1 is ", variable.1 ,"\n")
cat ("variable.2 is ", variable.2 ,"\n")
cat ("variable.3 is ", variable.3 ,"\n")

KEYWORDS

In programming, a keyword is a word which is reserved by a program because it has a special


meaning. A keyword can be a command or a parameter. Like in C, C++, Java, there is also a set of
keywords in R. A keyword can't be used as a variable name. Keywords are also called as "reserved
names."

There are the following keywords as per ?reserved or help(reserved) command:


if else repeat

while function for

next break TRUE

FALSE NULL Inf

NaN NA NA_integer_

NA_real_ NA_complex_ NA_character_

CONDITIONAL STATEMENTS

1) IF-ELSE

Example:

a<-22

if(a<20){

cat("I am lesser than 20")

}else{

cat("I am larger than 20")

2) repeat
The repeat keyword is used to iterate over a block of code multiple numbers of times. In R, repeat is a
loop, and in this loop statement, there is no condition to exit from the loop. For exiting the loop, we
will use the break statement.

Example:

x <- 1
repeat {
cat(x)
x = x+1
if (x == 6){
break
}
}
3) while
A while keyword is used as a loop. The while loop is executed until the given condition is true. This is
also used to make an infinite loop.

Example
a <- 20  
while(a!=0){  
      cat(a)  
      a = a-2  
}     

4) function
A function is an object in R programming. The keyword function is used to create a user-define
function in R. R has some pre-defined functions also, such as seq, mean, and sum.

Example:

new.function<- function(n) {
for(i in 1:n) {
a <- i^2
print(a)
}
}

new.function(6)

5) for
The for is a keyword which is used for looping or iterating over a sequence (dictionary, string, list, set
or tuple).

We can execute a set of a statement once for each item in the iterator (list, set, tuple, etc.) with the
help of for loop.

Example 1:

fruit <- c('Apple', 'Orange',"Guava", 'Pinapple', 'Banana','Grapes')


# Create the for statement
for ( i in fruit){
print(i)
}

Example 2:
# Creating an empty list
list <- c()
# Creating a for statement to populate the list
for (i in seq(1, 5, by=1)) {
list[[i]] <- i*i
}
print(list)

Example 3:

v <- LETTERS[1:4]
for ( i in v) {
print(i)
}

Example 4:
x <- c(2,5,3,9,8,11,6,44,43,47,67,95,33,65,12,45,12)
count <- 0
for (val in x) {
if(val %% 2 == 0) count = count+1
}
print(count)

Example 5:
# Creating a matrix  
mat <- matrix(data = seq(10, 21, by=1), nrow = 6, ncol =2)  
# Creating the loop with r and c to iterate over the matrix  
for (r in 1:nrow(mat))     
    for (c in 1:ncol(mat))    
         print(paste("mat[", r, ",",c, "]=", mat[r,c]))   
print(mat)  

6) next
The next keyword skips the current iteration of a loop without terminating it. When R parser found
next, it skips further evaluation and starts the new iteration of the loop.

Example:

v <- LETTERS[1:6]
for ( i in v) {
if (i == "D") {
next
}
print(i)
}
7) NULL
In R, NULL represents the null object. NULL is used to represent missing and undefined values. NULL
is the logical representation of a statement which is neither TRUE nor FALSE.

Example:

as.null(list(a = 1, b = "c"))  

8) Inf and NaN


The is.finite and is.infinite function returns a vector of the same length indicating which elements are
finite or infinite.

Inf and -Inf are positive and negative infinity. NaN stands for 'Not a Number.' NaN applies on numeric
values and real and imaginary parts of complex values, but it will not apply to the values of integer
vectors.

Example
is.finite(x)
is.infinite(x)
is.nan(x)

Inf
NaN

9) Else If
Example:
age <- readline(prompt="Enter age: ")  
age <- as.integer(age)  
if(age<18)  
    print("You are child")  
else if(age>30)  
    print("You are old guy")  
else  
    print("You are adult")

OPERATORS
There are the following types of operators used in R:
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators

Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math operations. The
operators act on each and every element of the vector. There are various arithmetic operators which
are supported by R.

S. Operator Description Example


No

1. + This operator is used to add two vectors in R. a <- c(2, b <- c(11, 5, 3)
3.3, 4) print(a+b)

It will give us the following

[1] 13.0 8.3 5.

2. - This operator is used to divide a vector from another b <- c(11, 5, 3)


one. a <- c(2, 3.3, 4) print(a-b)

It will give us the following

[1] -9.0 -1.7 3

3. * This operator is used to multiply two vectors with each b <- c(11, 5, 3)
other. a <- c(2, 3.3, 4) print(a*b)

It will give us the following

[1] 22.0 16.5 4

4. / This operator divides the vector from another one. a <- b <- c(11, 5, 3)
c(2, 3.3, 4) print(a/b)
It will give us the following

[1] 0.1818182 0.
4.0000000

5. %% This operator is used to find the remainder of the first b <- c(11, 5, 3)
vector with the second vector. a <- c(2, 3.3, 4) print(a%%b)

It will give us the following

[1] 2.0 3.3 0

6. %/% This operator is used to find the division of the first a <- c(2, 3.3, 4)
vector with the second(quotient). b <- c(11, 5, 3)
print(a%/%b)

It will give us the following

[1] 0 0 4

7. ^ This operator raised the first vector to the exponent of b <- c(11, 5, 3)
the second vector. a <- c(2, 3.3, 4) print(a^b)

It will give us the following

[1] 0248.0000 39
4.0000

Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities. These
include numerical equalities and inequalities. A relational operator compares each element of the first
vector with the corresponding element of the second vector. The result of the comparison will be a
Boolean value. There are the following relational operators which are supported by R:

S. Operato Description Example


No r

1. > This operator will return TRUE when every element in the first a <- c(1, 3,
vector is greater than the corresponding element of the second b <- c(2, 4,
vector. print(a>b)

It will give us the fo


output:

[1] FALSE
FALSE

2. < This operator will return TRUE when every element in the first a <- c(1, 9,
vector is less then the corresponding element of the second b <- c(2, 4,
vector. print(a<b)

It will give us the fo


output:

[1] FALSE
FALSE

3. <= This operator will return TRUE when every element in the first a <- c(1, 3,
vector is less than or equal to the corresponding element of b <- c(2, 3,
another vector. print(a<=b)

It will give us the fo


output:

[1] TRUE T
TRUE

4. >= This operator will return TRUE when every element in the first a <- c(1, 3,
vector is greater than or equal to the corresponding element of b <- c(2, 3,
another vector. print(a>=b)

It will give us the fo


output:

[1] FALSE
FALSE

5. == This operator will return TRUE when every element in the first a <- c(1, 3,
vector is equal to the corresponding element of the second b <- c(2, 3,
vector. print(a==b)

It will give us the fo


output:

[1] FALSE TRUE


6. != This operator will return TRUE when every element in the first a <- c(1, 3,
vector is not equal to the corresponding element of the second b <- c(2, 3,
vector. print(a>=b)

It will give us the fo


output:

[1] TRUE F
TRUE

Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions. In the
program, each operand is considered as a condition which can be evaluated to a false or true value.
The value of the conditions is used to determine the overall value of the op1 operator op2. Logical
operators are applicable to those vectors whose type is logical, numeric, or complex.

The logical operator compares each element of the first vector with the corresponding element of the
second vector.

There are the following types of operators which are supported by R:

S. Operator Description Example


No

1. & This operator is known as the Logical AND operator. This a <- c(3, 0,
operator takes the first element of both the vector and returns 2+2i)
TRUE if both the elements are TRUE. b <- c(2, 4,
2+3i)
print(a&b)

It will give us the follo


output:

[1] TRUE FA
TRUE TRUE

2. | This operator is called the Logical OR operator. This operator a <- c(3, 0,
takes the first element of both the vector and returns TRUE if 2+2i)
one of them is TRUE. b <- c(2, 4,
2+3i)
print(a|b)

It will give us the follo


output:

[1] TRUE TR
TRUE

3. ! This operator is known as Logical NOT operator. This operator a <- c(3, 0,
takes the first element of the vector and gives the opposite 2+2i)
logical value as a result. print(!a)

It will give us the follo


output:

[1] FALSE T
FALSE FALSE

4. && This operator takes the first element of both the vector and a <- c(3, 0,
gives TRUE as a result, only if both are TRUE. 2+2i)
b <- c(2, 4,
2+3i)
print(a&&b)

It will give us the follo


output:

[1] TRUE

5. || This operator takes the first element of both the vector and a <- c(3, 0,
gives the result TRUE, if one of them is true. 2+2i)
b <- c(2, 4,
2+3i)
print(a||b)

It will give us the follo


output:

[1] TRUE
Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these operators are used to
assign values to vectors. There are the following types of assignment

S. Operator Description Example


No

1. <- or = or These operators are known as left assignment a <- c(3, 0, TRUE,
<<- operators. b <<- c(2, 4, TRUE,
d = c(1, 2, TRUE, 2
print(a)
print(b)
print(d)

It will give us the following o

[1] 3+0i 0+0i 1+


2+2i
[1] 2+0i 4+0i 1+
2+3i
[1] 1+0i 2+0i 1+
2+3i

2. -> or ->> These operators are known as right assignment c(3, 0, TRUE, 2+2i)
operators. c(2, 4, TRUE, 2+3i)
print(a)
print(b)

It will give us the following o

[1] 3+0i 0+0i 1+


2+2i
[1] 2+0i 4+0i 1+
2+3i

operators which are supported by R:


Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators are not used for
general mathematical or logical computation. There are the following miscellaneous operators which
are supported in R

S. Operator Description Example


No

1. : The colon operator is used to v <- 1:8


create the series of numbers print(v)
in sequence for a vector.
It will give us the following output:

[1] 1 2 3 4 5 6 7 8

2. %in% This is used when we want to a1 <- 8


identify if an element belongs a2 <- 12
to a vector. d <- 1:10
print(a1%in%t)
print(a2%in%t)

It will give us the following output:

[1] FALSE
[1] FALSE

3. %*% It is used to multiply a matrix M=matrix(c(1,2,3,4,5,6), nrow=2,


with its transpose. ncol=3, byrow=TRUE)
T=m%*%T(m)
print(T)

It will give us the following output:

14 32
32 77

FUNCTIONS

"An R function is created by using the keyword function."


User-defined function
R allows us to create our own function in our program. A user defines a user-define function to fulfill
the requirement of user. Once these functions are created, we can use these functions like in-built
function.

# Creating a function without an argument.  
new.function <- function() {  
   for(i in 1:5) {  
      print(i^2)  
   }  
}     
  
new.function()  

Function calling with Argument Values


We can supply the arguments to a function call in the same sequence as defined in the function or can
supply in a different sequence but assigned them to the names of the arguments.

# Creating a function with arguments.  
new.function <- function(x,y,z) {  
   result <- x * y + z  
   print(result)  
}  
  
# Calling the function by position of arguments.  
new.function(11,13,9)  
  
# Calling the function by names of the arguments.  
new.function(x = 2, y = 5, z = 3)

R Built-in Functions
Statistical Probability Functions
R provides various statistical probability functions to perform statistical task. These statistical
functions are very helpful to find normal density, normal quantile and many more calculation. In R,
there are following functions which are used:

S. Function Description Example


No
1. dnorm(x, m=0, sd=1, It is used to find the height of the a <- seq(-7, 7, by=0.1)
log=False) probability distribution at each point b <- dnorm(a, mean=2.5, sd
to a given mean and standard png(file="dnorm.png")
deviation plot(x,y)
dev.off()

2. pnorm(q, m=0, sd=1, it is used to find the probability of a a <- seq(-7, 7, by=0.2)
lower.tail=TRUE, normally distributed random b <- dnorm(a, mean=2.5, sd
log.p=FALSE) numbers which are less than the png(file="pnorm.png")
value of a given number. plot(x,y)
dev.off()

3. qnorm(p, m=0, sd=1) It is used to find a number whose a <- seq(1, 2, by=002)
cumulative value matches with the b <- qnorm(a, mean=2.5, sd
probability value. png(file="qnorm.png")
plot(x,y)
dev.off()

4. rnorm(n, m=0, sd=1) It is used to generate random y <- rnorm(40)


numbers whose distribution is png(file="rnorm.png")
normal. hist(y, main="Normal
Distribution")
dev.off()

5. dbinom(x, size, prob) It is used to find the probability a<-seq(0, 40, by=1)
density distribution at each point. b<- dbinom(a, 40, 0.5)
png(file="pnorm.png")
plot(x,y)
dev.off()

6. pbinom(q, size, prob) It is used to find the cumulative a <- pbinom(25, 40,0.5)
probability (a single value print(a)
representing the probability) of an Output
event.
[1] 0.9596548

7. qbinom(p, size, prob) It is used to find a number whose a <- qbinom(0.25, 40,01/2)
cumulative value matches the print(a)
probability value. Output

[1] 18

8. rbinom(n, size, prob) It is used to generate required a <- rbinom(6, 140,0.4)


number of random values of a given print(a)
probability from a given sample. Output

[1] 55 61 46 56 58 4

9. dpois(x, lamba) it is the probability of x successes in dpois(a=2, lambda=3)+dpois


a period when the expected number lambda=3)+dpois(z=4, labda
of events is lambda (λ) Output

[1] 0.616115

10. ppois(q, lamba) It is a cumulative probability of less ppois(q=4, lambda=3,


than or equal to q successes. lower.tail=TRUE)-ppois(q=1
lambda=3, lower.tail=TRUE)
Output

[1] 0.6434504

11. rpois(n, lamba) It is used to generate random rpois(10, 10)


numbers from the poisson [1] 6 10 11 3 10 7
distribution. 14 12

12. dunif(x, min=0, max=1) This function provide information dunif(x, min=0, max=1,
about the uniform distribution on log=FALSE)
the interval from min to max. It
gives the density.

13. punif(q, min=0, max=1) It gives the distributed function punif(q, min=0, max=1,
lower.tail=TRUE, log.p=FAL

14. qunif(p, min=0, max=1) It gives the quantile function. qunif(p, min=0, max=1,
lower.tail=TRUE, log.p=FAL

15. runif(x, min=0, max=1) It generates random deviates. runif(x, min=0, max=1)

Other Statistical Function


Apart from the functions mentioned above, there are some other useful functions which helps for
statistical purpose. There are the following functions:
S. Function Description Example
No

1. mean(x, trim=0, It is used to find the mean for x a<-c(0:10, 40)


na.rm=FALSE) object xm<-mean(a)
print(xm)
Output

[1] 7.916667

2. sd(x) It returns standard deviation of an a<-c(0:10, 40)


object. xm<-sd(a)
print(xm)
Output

[1] 10.58694

3. median(x) It returns median. a<-c(0:10, 40)


xm<-meadian(a)
print(xm)
Output

[1] 5.5

4. quantilie(x, probs) It returns quantile where x is the


numeric vector whose quantiles
are desired and probs is a
numeric vector with probabilities
in [0, 1]

5. range(x) It returns range. a<-c(0:10, 40)


xm<-range(a)
print(xm)
Output

[1] 0 40

6. sum(x) It returns sum. a<-c(0:10, 40)


xm<-sum(a)
print(xm)
Output
[1] 95

7. diff(x, lag=1) It returns differences with lag a<-c(0:10, 40)


indicating which lag to use. xm<-diff(a)
print(xm)
Output

[1] 1 1 1 1 1 1 1
1 1 1 30

8. min(x) It returns minimum value. a<-c(0:10, 40)


xm<-min(a)
print(xm)
Output

[1] 0

9. max(x) It returns maximum value a<-c(0:10, 40)


xm<-max(a)
print(xm)
Output

[1] 40

10. scale(x, Column center or standardize a a <- matrix(1:9,3,3)


center=TRUE, matrix. scale(x)
scale=TRUE) Output

[,1]
[1,] -0.747776547
[2,] -0.653320562
[3,] -0.558864577
[4,] -0.464408592
[5,] -0.369952608
[6,] -0.275496623
[7,] -0.181040638
[8,] -0.086584653
[9,] 0.007871332
[10,] 0.102327317
[11,] 0.196783302
[12,] 3.030462849
attr(,"scaled:center")
[1] 7.916667
attr(,"scaled:scale")
[1] 10.58694

VECTOR  
In R, a sequence of elements which share the same data type is known as vector. A vector supports
logical, integer, double, character, complex, or raw data type. The elements which are contained in
vector known as components of the vector. We can check the type of vector with the help of
the typeof() function.

CREATING VECTOR
1. c() function to create a vector. This function returns a one-dimensional array or simply vector. 

2. Using the colon(:) operator


We can create a vector with the help of the colon operator. There is the following syntax to use colon
operator:

z<-x:y   

This operator creates a vector with elements from x to y and assigns it to z.

Example:

a<-4:-10  

   3) Using the seq() function


In R, we can create a vector with the help of the seq() function. A sequence function creates a
sequence of elements as a vector. The seq() function is used in two ways, i.e., by setting step size
with ?by' parameter or specifying the length of the vector with the 'length.out' feature.

Example:
seq_vec<-seq(1,4,by=0.5)  

Output

[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Example:

seq_vec<-seq(1,4,length.out=6)  

Output

[1] 1.0 1.6 2.2 2.8 3.4 4.0


Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to any variable
d, then this d variable will become a numeric type. A vector which contains numeric elements is
known as a numeric vector.

Example:

d<-45.5  
num_vec<-c(10.1, 10.2, 33.2)  

Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by "Int." The
Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an integer value to a
variable, i.e., by using as.integer() function and appending of L to the value.

A vector which contains integer elements is known as an integer vector.

Example:

d<-as.integer(5)  
e<-5L  
int_vec<-c(1,2,3,4,5)  
int_vec<-as.integer(int_vec)  
int_vec1<-c(1L,2L,3L,4L,5L)  

Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to create a
character data type value, i.e., using as.character() function and by typing string between double
quotes("") or single quotes('').

A vector which contains character elements is known as an integer vector.

Example:

1. d<-'shubham'  
2. e<-"Arpita"  
3. f<-65  
4. f<-as.character(f)  
Accessing elements of vectors
We can access the elements of a vector with the help of vector indexing. Indexing denotes the position
where the value in a vector is stored. Indexing will be performed with the help of integer, character,
or logic.

1) Indexing with integer vector


On integer vector, indexing is performed in the same way as we have applied in C, C++, and java.
There is only one difference, i.e., in C, C++, and java the indexing starts from 0, but in R, the
indexing starts from 1. Like other programming languages, we perform indexing by specifying an
integer value in square braces [] next to our vector.

Example:

seq_vec<-seq(1,4,length.out=6)    
seq_vec[2]  

2) Indexing with a character vector


In character vector indexing, we assign a unique key to each element of the vector. These keys are
uniquely defined as each element and can be accessed very easily. Let's see an example to
understand how it is performed.

Example:
char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)

char_vec["arpita"]

R Lists
In R, lists are the second type of vector. Lists are the objects of R which contain elements of different
types such as number, vectors, string and another list inside it. It can also contain a function or a
matrix as its elements. A list is a data structure which has components of mixed data types. We can
say, a list is a generic vector which contains other objects.

Example:

vec <- c(3,4,5,6)  
char_vec<-c("shubham","nishka","gunjan","sumit")  
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)  
out_list<-list(vec,char_vec,logic_vec)  

Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with the help of c()
function. Like c() function, there is another function, i.e., list() which is used to create a list in R. A list
avoid the drawback of the vector which is data type. We can add the elements in the list of different
data types.

list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),  
   list("BCA","MCA","B.tech"))  
  
# Giving names to the elements in the list.  
names(list_data) <- c("Students", "Marks", "Course")  
  
# Show the list.  
print(list_data)

Output:

$Students
[1] "Shubham" "Nishka" "Gunjan"

$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80

$Course
$Course[[1]]
[1] "BCA"

$Course[[2]]
[1] "MCA"

$Course[[3]]
[1] "B. tech."

Accessing List Elements


R provides two ways through which we can access the elements of a list. First one is the indexing
method performed in the same way as a vector. In the second one, we can access the elements of a
list with the help of names. It will be possible only with the named list.; we cannot access the
elements of a list using names if the list is normal.

Example 1: Accessing elements using index

# Creating a list containing a vector, a matrix and a list.  
list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),  
   list("BCA","MCA","B.tech"))  
# Accessing the first element of the list.  
print(list_data[1])  
  
# Accessing the third element. The third element is also a list, so all its elements will be printed.  
print(list_data[3])  
Example 2: Accessing elements using names

# Creating a list containing a vector, a matrix and a list.  
list_data < list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),list("BCA","M
CA","B.tech"))  
# Giving names to the elements in the list.  
names(list_data) <- c("Student", "Marks", "Course")  
# Accessing the first element of the list.  
print(list_data["Student"])  
print(list_data$Marks)  
print(list_data)  

Converting list to vector


There is a drawback with the list, i.e., we cannot perform all the arithmetic operations on list
elements. To remove this, drawback R provides unlist() function. This function converts the list into
vectors. In some cases, it is required to convert a list into a vector so that we can use the elements of
the vector for further manipulation.

The unlist() function takes the list as a parameter and change into a vector.

Example:

# Creating lists.  
list1 <- list(10:20)  
print(list1)  
  
list2 <-list(5:14)  
print(list2)  
  
# Converting the lists to vectors.  
v1 <- unlist(list1)  
v2 <- unlist(list2)  

R Arrays
In R, arrays are the data objects which allow us to store data in more than two dimensions. In R, an
array is created with the help of the array() function. This array() function takes a vector as an input
and to create an array it uses vectors values in the dim parameter.

For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangular
matrices of 2 row and 3 columns.

Syntax:

array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))  
data

The data is the first argument in the array() function. It is an input vector which is given to the array.

matrices

In R, the array consists of multi-dimensional matrices.

row_size

This parameter defines the number of row elements which an array can store.

column_size

This parameter defines the number of columns elements which an array can store.

dim_names

This parameter is used to change the default names of rows and columns.

R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help of
the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.

In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements
are the real numbers. In R, we use matrix function, which can easily reproduce the memory
representation of the matrix. In the R matrix, all the elements must share a common basic type.

Example

matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)  

OUTPUT:

[,1] [,2] [,3]


[1,] 11 13 15
[2,] 12 14 16

create a matrix in R
Like vector and list, R provides a function which creates a matrix. R provides the matrix() function to
create a matrix. This function plays an important role in data analysis. There is the following syntax of
the matrix in R:

matrix(data, nrow, ncol, byrow, dim_name)  
data

The first argument in matrix function is data. It is the input vector which is the data elements of the
matrix.

nrow

The second argument is the number of rows which we want to create in the matrix.

ncol

The third argument is the number of columns which we want to create in the matrix.

byrow

The byrow parameter is a logical clue. If its value is true, then the input vector elements are arranged
by row.

dim_name

The dim_name parameter is the name assigned to the rows and columns.

Let's see an example to understand how matrix function is used to create a matrix and arrange the
elements sequentially by row or column.

Example

#Arranging elements sequentially by row.  
P <- matrix(c(5:16), nrow = 4, byrow = TRUE)  
print(P)  
  
# Arranging elements sequentially by column.  
Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)  
print(Q)  
  
# Defining the column and row names.  
row_names = c("row1", "row2", "row3", "row4")  
col_names = c("col1", "col2", "col3")  
 
R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names)) 

Accessing matrix elements in R


Like C and C++, we can easily access the elements of our matrix by using the index of the element.
There are three ways to access the elements from the matrix.

1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth column.

# Defining the column and row names.  
row_names = c("row1", "row2", "row3", "row4")  
ccol_names = c("col1", "col2", "col3")  
#Creating matrix   
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))  
print(R)  
  
#Accessing element present on 3rd row and 2nd column  
print(R[3,2])  
  
#Accessing element present in 3rd row  
print(R[3,])  
  
#Accessing element present in 2nd column  
print(R[,2])  

Output

col1 col2 col3


row1 5 6 7
row2 8 9 10
row3 11 12 13
row4 14 15 16

[1] 12

col1 col2 col3


11 12 13

row1 row2 row3 row4


6 9 12 15

OPERATIONS ON MATRIX

R <- matrix(c(5:16), nrow = 4,ncol=3)  
S <- matrix(c(1:12), nrow = 4,ncol=3)  
  
#Addition  
sum<-R+S  
print(sum)  
  
#Subtraction  
sub<-R-S  
print(sub)  
  
#Multiplication  
mul<-R*S  
print(mul)  
  
#Multiplication by constant  
mul1<-R*12  
print(mul1)  
  
#Division  
div<-R/S  
print(div)  

Output
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28

[,1] [,2] [,3]


[1,] 4 4 4
[2,] 4 4 4
[3,] 4 4 4
[4,] 4 4 4

[,1] [,2] [,3]


[1,] 5 45 117
[2,] 12 60 140
[3,] 21 77 165
[4,] 32 96 192

[,1] [,2] [,3]


[1,] 60 108 156
[2,] 72 120 168
[3,] 84 132 180
[4,] 96 144 192

[,1] [,2] [,3]


[1,] 5.000000 1.800000 1.444444
[2,] 3.000000 1.666667 1.400000
[3,] 2.333333 1.571429 1.363636
[4,] 2.000000 1.500000 1.333333

R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains values of
one variable, and rows contains one set of values from each column. A data frame is a special case of
the list in which each component has equal length.

A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.
In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a data
frame can contain different data types such as numeric, character, factor, etc.

next →← prev

R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains values
of one variable, and rows contains one set of values from each column. A data frame is a special
case of the list in which each component has equal length.

A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.

In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a
data frame can contain different data types such as numeric, character, factor, etc.

There are following characteristics of a data frame.

o The columns name should be non-empty.


o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.

How to create Data Frame


In R, the data frames are created with the help of frame() function of data. This function contains the
vectors of any type such as numeric, character, or integer. In below example, we create a data frame
that contains employee id (integer vector), employee name(character vector), salary(numeric vector),
and starting date(Date vector).

Example

# Creating the data frame.  
emp.data<- data.frame(  
employee_id = c (1:5),   
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
sal = c(623.3,915.2,611.0,729.0,843.25),   
  
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
      "2015-03-27")),  
stringsAsFactors = FALSE  
)  
# Printing the data frame.            
print(emp.data)  
Output

employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27

Extracting data from Data Frame


The data of the data frame is very crucial for us. To manipulate the data of the data frame, it is
essential to extract it from the data frame. We can extract the data in three ways which are as
follows:

1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.

Extracting the specific columns from a data frame


Example

# Creating the data frame.  
emp.data<- data.frame(  
employee_id = c (1:5),   
employee_name= c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
sal = c(623.3,515.2,611.0,729.0,843.25),   
  
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
      "2015-03-27")),  
stringsAsFactors = FALSE  
)  
# Extracting specific columns from a data frame       
final <- data.frame(emp.data$employee_id,emp.data$sal)  
print(final)  

Output

emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25

Extracting the specific rows from a data frame


Example

# Creating the data frame.  
emp.data<- data.frame(  
employee_id = c (1:5),   
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
sal = c(623.3,515.2,611.0,729.0,843.25),   
  
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
      "2015-03-27")),  
stringsAsFactors = FALSE  
)  
# Extracting first row from a data frame      
final <- emp.data[1,]  
print(final)  
  
  
# Extracting last two row from a data frame       
final <- emp.data[4:5,]  
print(final)  

Output

employee_id employee_name sal starting_date


1 1 Shubham 623.3 2012-01-01

employee_id employee_name sal starting_date


4 4 Gunjan 729.00 2014-05-11
5 5 Sumit 843.25 2015-03-27

Summary of data in Data Frames


In some cases, it is required to find the statistical summary and nature of the data in the data frame.
R provides the summary() function to extract the statistical summary and nature of the data. This
function takes the data frame as a parameter and returns the statistical information of the data. Let?s
see an example to understand how this function is used in R:

Example

# Creating the data frame.  
emp.data<- data.frame(  
employee_id = c (1:5),   
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
sal = c(623.3,515.2,611.0,729.0,843.25),   
  
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
      "2015-03-27")),  
stringsAsFactors = FALSE  
)  
print(emp.data)  
  
#Printing the summary  
print(summary(emp.data))  

Output

employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27

employee_idemployee_namesalstarting_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27

R factors
The factor is a data structure which is used for fields which take only predefined finite number of
values. These are the variable which takes a limited number of different values. These are the data
objects which are used to categorize the data and to store it on multiple levels. It can store both
integers and strings values, and are useful in the column that has a limited number of unique values.

Factors have labels which are associated with the unique integers stored in it. It contains predefined
set value known as levels and by default R always sorts levels in alphabetical order.

Attributes of a factor
There are the following attributes of a factor in R

a. X
It is the input vector which is to be transformed into a factor.
b. levels
It is an input vector that represents a set of unique values which are taken by x.
c. labels
It is a character vector which corresponds to the number of labels.
d. Exclude
It is used to specify the value which we want to be excluded,
e. ordered
It is a logical attribute which determines if the levels are ordered.
f. nmax
It is used to specify the upper bound for the maximum number of level.

How to create a factor?


In R, it is quite simple to create a factor. A factor is created in two steps

1. In the first step, we create a vector.


2. Next step is to convert the vector into a factor,

R provides factor() function to convert the vector into factor. There is the following syntax of factor()
function

factor_data<- factor(vector)  

Example

1. # Creating a vector as input.  
2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","Su
mit","Arpita","Sumit")  
3.   
4. print(data)  
5. print(is.factor(data))  
6.   
7. # Applying the factor function.  
8. factor_data<- factor(data)  
9.   
10. print(factor_data)  
11. print(is.factor(factor_data))  

Output

[1] "Shubham" "Nishka" "Arpita" "Nishka" "Shubham" "Sumit" "Nishka"


[8] "Shubham" "Sumit" "Arpita" "Sumit"
[1] FALSE
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] TRUE

Accessing components of factor


Like vectors, we can access the components of factors. The process of accessing components of factor
is much more similar to the vectors. We can access the element with the help of the indexing method
or using logical vectors. Let's see an example in which we understand the different-different ways of
accessing the components.

Example
# Creating a vector as input.  
data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","Sumit","Ar
pita","Sumit")  
  
# Applying the factor function.  
factor_data<- factor(data)  
  
#Printing all elements of factor  
print(factor_data)  
  
#Accessing 4th element of factor  
print(factor_data[4])  
  
#Accessing 5th and 7th element  
print(factor_data[c(5,7)])  
  
#Accessing all elemcent except 4th one  
print(factor_data[-4])  
  
#Accessing elements using logical vector  
print(factor_data[c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE)])  

Output

[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Nishka
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka


Levels: Arpita Nishka Shubham Sumit

[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit

[1] Shubham Shubham Sumit Nishka Sumit


Levels: Arpita Nishka Shubham Sumit

Factor in Data Frame


When we create a frame with a column of text data, R treats this text column as categorical data and
creates factor on it.

Example
# Creating the vectors for data frame.  
height <- c(132,162,152,166,139,147,122)  
weight <- c(40,49,48,40,67,52,53)  
gender <- c("male","male","female","female","male","female","male")  
  
# Creating the data frame.  
input_data<- data.frame(height,weight,gender)  
print(input_data)  
  
# Testing if the gender column is a factor.  
print(is.factor(input_data$gender))  
  
# Printing the gender column to see the levels.  
print(input_data$gender)  

Output

height weight gender


1 132 40 male
2 162 49 male
3 152 48 female
4 166 40 female
5 139 67 male
6 147 52 female
7 122 53 male
[1] TRUE
[1] male male female female male female male
Levels: female male

You might also like