1) Open Source: R Advantages
1) Open Source: R Advantages
1) Open Source
An open-source language is a language on which we can work without any need for a license or a fee.
R is an open-source language. We can contribute to the development of R by optimizing our packages,
developing new ones, and resolving issues.
2) Platform Independent
R is a platform-independent language or cross-platform programming language which means its code
can run on all operating systems. R enables programmers to develop software for several competing
platforms by writing a program only once. R can run quite easily on Windows, Linux, and Mac.
7) Statistics
R is mainly known as the language of statistics. It is the main reason why R is predominant than other
programming languages for the development of statistical tools.
8) Continuously Growing
R is a constantly evolving programming language. Constantly evolving means when something
evolves, it changes or develops over time, like our taste in music and clothes, which evolve as we get
older. R is a state of the art which provides updates whenever any new feature is added.
Cons
1) Data Handling
In R, objects are stored in physical memory. It is in contrast with other programming languages like
Python. R utilizes more memory as compared to Python. It requires the entire data in one single place
which is in the memory. It is not an ideal option when we deal with Big Data.
2) Basic Security
R lacks basic security. It is an essential part of most programming languages such as Python. Because
of this, there are many restrictions with R as it cannot be embedded in a web-application.
3) Complicated Language
R is a very complicated language, and it has a steep learning curve. The people who don't have prior
knowledge or programming experience may find it difficult to learn R.
4) Weak Origin
The main disadvantage of R is, it does not have support for dynamic or 3D graphics. The reason
behind this is its origin. It shares its origin with a much older programming language "S."
5) Lesser Speed
R programming language is much slower than other programming languages such as MATLAB and
Python. In comparison to other programming language, R packages are much slower.
In R, algorithms are spread across different packages. The programmers who have no prior knowledge
of packages may find it difficult to implement algorithms.
Comments
In R programming, comments are the programmer readable explanation in the source code of an R
program. The purpose of adding these comments is to make the source code easier to understand.
These comments are generally ignored by compilers and interpreters.
In R programming there is only single-line comment. R doesn't support multi-line comment. But if we
want to perform multi-line comments, then we can add our code in a false block.
Single-line comment
#My First program in R programming
string <-"Hello World!"
print(string)
DATA TYPES
There are the following data types which are used in R programming:
Logical True, False It is a special data type for data with only two possible values
which can be construed as true/false.
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.
Example:
variable_logical<- TRUE
cat(variable_logical,"\n")
cat("The data type of variable_logical is ",class(variable_logical),"\n\n")
R DATA STRUCTURES
Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A vector
is a collection of elements which is most commonly of mode character, integer, logical or
numeric" A vector can be one of the following two types:
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single mode. A
list contains a mixture of data types. The list is also known as generic vectors because the element of
the list can be of any type of R object. "A list is a special type of vector in which each element
can be a different type."
We can create a list with the help of list() or as.list(). We can use vector() to create a required length
empty list.
Arrays
There is another type of data objects which can store data in more than two dimensions known as
arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four rectangular
matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an input
and uses the value in the dim parameter to create an array.
Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional rectangular layout. In
the matrix, elements of the same atomic types are contained. For mathematical calculation, this can
use a matrix containing the numeric element. A matrix is created with the help of the matrix()
function in R.
Syntax
1. matrix(data, no_row, no_col, by_row, dim_name)
Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which each
column contains the value of one variable, and row contains the set of value from each column.
Factors
Factors are also data objects that are used to categorize the data and store it as levels. Factors can
store both strings and integers. Columns have a limited number of unique values so that factors are
very useful in columns. It is very useful in data analysis for statistical modeling.
Factors are created with the help of factor() function by taking a vector as an input parameter.
VARIABLES
Variables are used to store the information to be manipulated and referenced in the R program. The R
variable can store an atomic vector, a group of atomic vectors, or a combination of many R objects.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type of data
type when the statement is run. A valid variable name contains letter, numbers, dot and underlines
characters. A variable name should start with a letter or the dot not followed by a number.
var_name, Valid Variable can start with a dot, but dot should not be followed by a number.
var.name In this case, the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name except dot
and underscore.
.2var_name Invalid A variable name cannot start with a dot which is followed by a digit.
var_name2 Valid The variable contains letter, number and underscore and starts with a
letter.
Assignment of variable
There are three operators which we can use to assign the values to the variable. We can use leftward,
rightward, and equal_to operator for this purpose.
Example:
print(variable.1)
cat ("variable.1 is ", variable.1 ,"\n")
cat ("variable.2 is ", variable.2 ,"\n")
cat ("variable.3 is ", variable.3 ,"\n")
KEYWORDS
NaN NA NA_integer_
CONDITIONAL STATEMENTS
1) IF-ELSE
Example:
a<-22
if(a<20){
}else{
2) repeat
The repeat keyword is used to iterate over a block of code multiple numbers of times. In R, repeat is a
loop, and in this loop statement, there is no condition to exit from the loop. For exiting the loop, we
will use the break statement.
Example:
x <- 1
repeat {
cat(x)
x = x+1
if (x == 6){
break
}
}
3) while
A while keyword is used as a loop. The while loop is executed until the given condition is true. This is
also used to make an infinite loop.
Example
a <- 20
while(a!=0){
cat(a)
a = a-2
}
4) function
A function is an object in R programming. The keyword function is used to create a user-define
function in R. R has some pre-defined functions also, such as seq, mean, and sum.
Example:
new.function<- function(n) {
for(i in 1:n) {
a <- i^2
print(a)
}
}
new.function(6)
5) for
The for is a keyword which is used for looping or iterating over a sequence (dictionary, string, list, set
or tuple).
We can execute a set of a statement once for each item in the iterator (list, set, tuple, etc.) with the
help of for loop.
Example 1:
Example 2:
# Creating an empty list
list <- c()
# Creating a for statement to populate the list
for (i in seq(1, 5, by=1)) {
list[[i]] <- i*i
}
print(list)
Example 3:
v <- LETTERS[1:4]
for ( i in v) {
print(i)
}
Example 4:
x <- c(2,5,3,9,8,11,6,44,43,47,67,95,33,65,12,45,12)
count <- 0
for (val in x) {
if(val %% 2 == 0) count = count+1
}
print(count)
Example 5:
# Creating a matrix
mat <- matrix(data = seq(10, 21, by=1), nrow = 6, ncol =2)
# Creating the loop with r and c to iterate over the matrix
for (r in 1:nrow(mat))
for (c in 1:ncol(mat))
print(paste("mat[", r, ",",c, "]=", mat[r,c]))
print(mat)
6) next
The next keyword skips the current iteration of a loop without terminating it. When R parser found
next, it skips further evaluation and starts the new iteration of the loop.
Example:
v <- LETTERS[1:6]
for ( i in v) {
if (i == "D") {
next
}
print(i)
}
7) NULL
In R, NULL represents the null object. NULL is used to represent missing and undefined values. NULL
is the logical representation of a statement which is neither TRUE nor FALSE.
Example:
as.null(list(a = 1, b = "c"))
Inf and -Inf are positive and negative infinity. NaN stands for 'Not a Number.' NaN applies on numeric
values and real and imaginary parts of complex values, but it will not apply to the values of integer
vectors.
Example
is.finite(x)
is.infinite(x)
is.nan(x)
Inf
NaN
9) Else If
Example:
age <- readline(prompt="Enter age: ")
age <- as.integer(age)
if(age<18)
print("You are child")
else if(age>30)
print("You are old guy")
else
print("You are adult")
OPERATORS
There are the following types of operators used in R:
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math operations. The
operators act on each and every element of the vector. There are various arithmetic operators which
are supported by R.
1. + This operator is used to add two vectors in R. a <- c(2, b <- c(11, 5, 3)
3.3, 4) print(a+b)
3. * This operator is used to multiply two vectors with each b <- c(11, 5, 3)
other. a <- c(2, 3.3, 4) print(a*b)
4. / This operator divides the vector from another one. a <- b <- c(11, 5, 3)
c(2, 3.3, 4) print(a/b)
It will give us the following
[1] 0.1818182 0.
4.0000000
5. %% This operator is used to find the remainder of the first b <- c(11, 5, 3)
vector with the second vector. a <- c(2, 3.3, 4) print(a%%b)
6. %/% This operator is used to find the division of the first a <- c(2, 3.3, 4)
vector with the second(quotient). b <- c(11, 5, 3)
print(a%/%b)
[1] 0 0 4
7. ^ This operator raised the first vector to the exponent of b <- c(11, 5, 3)
the second vector. a <- c(2, 3.3, 4) print(a^b)
[1] 0248.0000 39
4.0000
Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities. These
include numerical equalities and inequalities. A relational operator compares each element of the first
vector with the corresponding element of the second vector. The result of the comparison will be a
Boolean value. There are the following relational operators which are supported by R:
1. > This operator will return TRUE when every element in the first a <- c(1, 3,
vector is greater than the corresponding element of the second b <- c(2, 4,
vector. print(a>b)
[1] FALSE
FALSE
2. < This operator will return TRUE when every element in the first a <- c(1, 9,
vector is less then the corresponding element of the second b <- c(2, 4,
vector. print(a<b)
[1] FALSE
FALSE
3. <= This operator will return TRUE when every element in the first a <- c(1, 3,
vector is less than or equal to the corresponding element of b <- c(2, 3,
another vector. print(a<=b)
[1] TRUE T
TRUE
4. >= This operator will return TRUE when every element in the first a <- c(1, 3,
vector is greater than or equal to the corresponding element of b <- c(2, 3,
another vector. print(a>=b)
[1] FALSE
FALSE
5. == This operator will return TRUE when every element in the first a <- c(1, 3,
vector is equal to the corresponding element of the second b <- c(2, 3,
vector. print(a==b)
[1] TRUE F
TRUE
Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions. In the
program, each operand is considered as a condition which can be evaluated to a false or true value.
The value of the conditions is used to determine the overall value of the op1 operator op2. Logical
operators are applicable to those vectors whose type is logical, numeric, or complex.
The logical operator compares each element of the first vector with the corresponding element of the
second vector.
1. & This operator is known as the Logical AND operator. This a <- c(3, 0,
operator takes the first element of both the vector and returns 2+2i)
TRUE if both the elements are TRUE. b <- c(2, 4,
2+3i)
print(a&b)
[1] TRUE FA
TRUE TRUE
2. | This operator is called the Logical OR operator. This operator a <- c(3, 0,
takes the first element of both the vector and returns TRUE if 2+2i)
one of them is TRUE. b <- c(2, 4,
2+3i)
print(a|b)
[1] TRUE TR
TRUE
3. ! This operator is known as Logical NOT operator. This operator a <- c(3, 0,
takes the first element of the vector and gives the opposite 2+2i)
logical value as a result. print(!a)
[1] FALSE T
FALSE FALSE
4. && This operator takes the first element of both the vector and a <- c(3, 0,
gives TRUE as a result, only if both are TRUE. 2+2i)
b <- c(2, 4,
2+3i)
print(a&&b)
[1] TRUE
5. || This operator takes the first element of both the vector and a <- c(3, 0,
gives the result TRUE, if one of them is true. 2+2i)
b <- c(2, 4,
2+3i)
print(a||b)
[1] TRUE
Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these operators are used to
assign values to vectors. There are the following types of assignment
1. <- or = or These operators are known as left assignment a <- c(3, 0, TRUE,
<<- operators. b <<- c(2, 4, TRUE,
d = c(1, 2, TRUE, 2
print(a)
print(b)
print(d)
2. -> or ->> These operators are known as right assignment c(3, 0, TRUE, 2+2i)
operators. c(2, 4, TRUE, 2+3i)
print(a)
print(b)
[1] 1 2 3 4 5 6 7 8
[1] FALSE
[1] FALSE
14 32
32 77
FUNCTIONS
# Creating a function without an argument.
new.function <- function() {
for(i in 1:5) {
print(i^2)
}
}
new.function()
# Creating a function with arguments.
new.function <- function(x,y,z) {
result <- x * y + z
print(result)
}
# Calling the function by position of arguments.
new.function(11,13,9)
# Calling the function by names of the arguments.
new.function(x = 2, y = 5, z = 3)
R Built-in Functions
Statistical Probability Functions
R provides various statistical probability functions to perform statistical task. These statistical
functions are very helpful to find normal density, normal quantile and many more calculation. In R,
there are following functions which are used:
2. pnorm(q, m=0, sd=1, it is used to find the probability of a a <- seq(-7, 7, by=0.2)
lower.tail=TRUE, normally distributed random b <- dnorm(a, mean=2.5, sd
log.p=FALSE) numbers which are less than the png(file="pnorm.png")
value of a given number. plot(x,y)
dev.off()
3. qnorm(p, m=0, sd=1) It is used to find a number whose a <- seq(1, 2, by=002)
cumulative value matches with the b <- qnorm(a, mean=2.5, sd
probability value. png(file="qnorm.png")
plot(x,y)
dev.off()
5. dbinom(x, size, prob) It is used to find the probability a<-seq(0, 40, by=1)
density distribution at each point. b<- dbinom(a, 40, 0.5)
png(file="pnorm.png")
plot(x,y)
dev.off()
6. pbinom(q, size, prob) It is used to find the cumulative a <- pbinom(25, 40,0.5)
probability (a single value print(a)
representing the probability) of an Output
event.
[1] 0.9596548
7. qbinom(p, size, prob) It is used to find a number whose a <- qbinom(0.25, 40,01/2)
cumulative value matches the print(a)
probability value. Output
[1] 18
[1] 55 61 46 56 58 4
[1] 0.616115
[1] 0.6434504
12. dunif(x, min=0, max=1) This function provide information dunif(x, min=0, max=1,
about the uniform distribution on log=FALSE)
the interval from min to max. It
gives the density.
13. punif(q, min=0, max=1) It gives the distributed function punif(q, min=0, max=1,
lower.tail=TRUE, log.p=FAL
14. qunif(p, min=0, max=1) It gives the quantile function. qunif(p, min=0, max=1,
lower.tail=TRUE, log.p=FAL
15. runif(x, min=0, max=1) It generates random deviates. runif(x, min=0, max=1)
[1] 7.916667
[1] 10.58694
[1] 5.5
[1] 0 40
[1] 1 1 1 1 1 1 1
1 1 1 30
[1] 0
[1] 40
[,1]
[1,] -0.747776547
[2,] -0.653320562
[3,] -0.558864577
[4,] -0.464408592
[5,] -0.369952608
[6,] -0.275496623
[7,] -0.181040638
[8,] -0.086584653
[9,] 0.007871332
[10,] 0.102327317
[11,] 0.196783302
[12,] 3.030462849
attr(,"scaled:center")
[1] 7.916667
attr(,"scaled:scale")
[1] 10.58694
VECTOR
In R, a sequence of elements which share the same data type is known as vector. A vector supports
logical, integer, double, character, complex, or raw data type. The elements which are contained in
vector known as components of the vector. We can check the type of vector with the help of
the typeof() function.
CREATING VECTOR
1. c() function to create a vector. This function returns a one-dimensional array or simply vector.
z<-x:y
Example:
a<-4:-10
Example:
seq_vec<-seq(1,4,by=0.5)
Output
Example:
seq_vec<-seq(1,4,length.out=6)
Output
Example:
d<-45.5
num_vec<-c(10.1, 10.2, 33.2)
Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by "Int." The
Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an integer value to a
variable, i.e., by using as.integer() function and appending of L to the value.
Example:
d<-as.integer(5)
e<-5L
int_vec<-c(1,2,3,4,5)
int_vec<-as.integer(int_vec)
int_vec1<-c(1L,2L,3L,4L,5L)
Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to create a
character data type value, i.e., using as.character() function and by typing string between double
quotes("") or single quotes('').
Example:
1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
Accessing elements of vectors
We can access the elements of a vector with the help of vector indexing. Indexing denotes the position
where the value in a vector is stored. Indexing will be performed with the help of integer, character,
or logic.
Example:
seq_vec<-seq(1,4,length.out=6)
seq_vec[2]
Example:
char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
char_vec["arpita"]
R Lists
In R, lists are the second type of vector. Lists are the objects of R which contain elements of different
types such as number, vectors, string and another list inside it. It can also contain a function or a
matrix as its elements. A list is a data structure which has components of mixed data types. We can
say, a list is a generic vector which contains other objects.
Example:
vec <- c(3,4,5,6)
char_vec<-c("shubham","nishka","gunjan","sumit")
logic_vec<-c(TRUE,FALSE,FALSE,TRUE)
out_list<-list(vec,char_vec,logic_vec)
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with the help of c()
function. Like c() function, there is another function, i.e., list() which is used to create a list in R. A list
avoid the drawback of the vector which is data type. We can add the elements in the list of different
data types.
list_data <- list(c("Shubham","Nishka","Gunjan"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
# Giving names to the elements in the list.
names(list_data) <- c("Students", "Marks", "Course")
# Show the list.
print(list_data)
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
# Creating a list containing a vector, a matrix and a list.
list_data <- list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),
list("BCA","MCA","B.tech"))
# Accessing the first element of the list.
print(list_data[1])
# Accessing the third element. The third element is also a list, so all its elements will be printed.
print(list_data[3])
Example 2: Accessing elements using names
# Creating a list containing a vector, a matrix and a list.
list_data < list(c("Shubham","Arpita","Nishka"), matrix(c(40,80,60,70,90,80), nrow = 2),list("BCA","M
CA","B.tech"))
# Giving names to the elements in the list.
names(list_data) <- c("Student", "Marks", "Course")
# Accessing the first element of the list.
print(list_data["Student"])
print(list_data$Marks)
print(list_data)
The unlist() function takes the list as a parameter and change into a vector.
Example:
# Creating lists.
list1 <- list(10:20)
print(list1)
list2 <-list(5:14)
print(list2)
# Converting the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
R Arrays
In R, arrays are the data objects which allow us to store data in more than two dimensions. In R, an
array is created with the help of the array() function. This array() function takes a vector as an input
and to create an array it uses vectors values in the dim parameter.
For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangular
matrices of 2 row and 3 columns.
Syntax:
array_name <- array(data, dim= (row_size, column_size, matrices, dim_names))
data
The data is the first argument in the array() function. It is an input vector which is given to the array.
matrices
row_size
This parameter defines the number of row elements which an array can store.
column_size
This parameter defines the number of columns elements which an array can store.
dim_names
This parameter is used to change the default names of rows and columns.
R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help of
the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements
are the real numbers. In R, we use matrix function, which can easily reproduce the memory
representation of the matrix. In the R matrix, all the elements must share a common basic type.
Example
matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
OUTPUT:
create a matrix in R
Like vector and list, R provides a function which creates a matrix. R provides the matrix() function to
create a matrix. This function plays an important role in data analysis. There is the following syntax of
the matrix in R:
matrix(data, nrow, ncol, byrow, dim_name)
data
The first argument in matrix function is data. It is the input vector which is the data elements of the
matrix.
nrow
The second argument is the number of rows which we want to create in the matrix.
ncol
The third argument is the number of columns which we want to create in the matrix.
byrow
The byrow parameter is a logical clue. If its value is true, then the input vector elements are arranged
by row.
dim_name
The dim_name parameter is the name assigned to the rows and columns.
Let's see an example to understand how matrix function is used to create a matrix and arrange the
elements sequentially by row or column.
Example
#Arranging elements sequentially by row.
P <- matrix(c(5:16), nrow = 4, byrow = TRUE)
print(P)
# Arranging elements sequentially by column.
Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(Q)
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth column.
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
ccol_names = c("col1", "col2", "col3")
#Creating matrix
R <- matrix(c(5:16), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names))
print(R)
#Accessing element present on 3rd row and 2nd column
print(R[3,2])
#Accessing element present in 3rd row
print(R[3,])
#Accessing element present in 2nd column
print(R[,2])
Output
[1] 12
OPERATIONS ON MATRIX
R <- matrix(c(5:16), nrow = 4,ncol=3)
S <- matrix(c(1:12), nrow = 4,ncol=3)
#Addition
sum<-R+S
print(sum)
#Subtraction
sub<-R-S
print(sub)
#Multiplication
mul<-R*S
print(mul)
#Multiplication by constant
mul1<-R*12
print(mul1)
#Division
div<-R/S
print(div)
Output
[,1] [,2] [,3]
[1,] 6 14 22
[2,] 8 16 24
[3,] 10 18 26
[4,] 12 20 28
R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains values of
one variable, and rows contains one set of values from each column. A data frame is a special case of
the list in which each component has equal length.
A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.
In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a data
frame can contain different data types such as numeric, character, factor, etc.
next →← prev
R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains values
of one variable, and rows contains one set of values from each column. A data frame is a special
case of the list in which each component has equal length.
A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.
In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a
data frame can contain different data types such as numeric, character, factor, etc.
Example
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,915.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Printing the data frame.
print(emp.data)
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (1:5),
employee_name= c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,515.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extracting specific columns from a data frame
final <- data.frame(emp.data$employee_id,emp.data$sal)
print(final)
Output
emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,515.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Extracting first row from a data frame
final <- emp.data[1,]
print(final)
# Extracting last two row from a data frame
final <- emp.data[4:5,]
print(final)
Output
Example
# Creating the data frame.
emp.data<- data.frame(
employee_id = c (1:5),
employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),
sal = c(623.3,515.2,611.0,729.0,843.25),
starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
print(emp.data)
#Printing the summary
print(summary(emp.data))
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27
R factors
The factor is a data structure which is used for fields which take only predefined finite number of
values. These are the variable which takes a limited number of different values. These are the data
objects which are used to categorize the data and to store it on multiple levels. It can store both
integers and strings values, and are useful in the column that has a limited number of unique values.
Factors have labels which are associated with the unique integers stored in it. It contains predefined
set value known as levels and by default R always sorts levels in alphabetical order.
Attributes of a factor
There are the following attributes of a factor in R
a. X
It is the input vector which is to be transformed into a factor.
b. levels
It is an input vector that represents a set of unique values which are taken by x.
c. labels
It is a character vector which corresponds to the number of labels.
d. Exclude
It is used to specify the value which we want to be excluded,
e. ordered
It is a logical attribute which determines if the levels are ordered.
f. nmax
It is used to specify the upper bound for the maximum number of level.
R provides factor() function to convert the vector into factor. There is the following syntax of factor()
function
factor_data<- factor(vector)
Example
1. # Creating a vector as input.
2. data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","Su
mit","Arpita","Sumit")
3.
4. print(data)
5. print(is.factor(data))
6.
7. # Applying the factor function.
8. factor_data<- factor(data)
9.
10. print(factor_data)
11. print(is.factor(factor_data))
Output
Example
# Creating a vector as input.
data <- c("Shubham","Nishka","Arpita","Nishka","Shubham","Sumit","Nishka","Shubham","Sumit","Ar
pita","Sumit")
# Applying the factor function.
factor_data<- factor(data)
#Printing all elements of factor
print(factor_data)
#Accessing 4th element of factor
print(factor_data[4])
#Accessing 5th and 7th element
print(factor_data[c(5,7)])
#Accessing all elemcent except 4th one
print(factor_data[-4])
#Accessing elements using logical vector
print(factor_data[c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE)])
Output
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] Nishka
Levels: Arpita Nishka Shubham Sumit
[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit
Example
# Creating the vectors for data frame.
height <- c(132,162,152,166,139,147,122)
weight <- c(40,49,48,40,67,52,53)
gender <- c("male","male","female","female","male","female","male")
# Creating the data frame.
input_data<- data.frame(height,weight,gender)
print(input_data)
# Testing if the gender column is a factor.
print(is.factor(input_data$gender))
# Printing the gender column to see the levels.
print(input_data$gender)
Output