You are on page 1of 128

R Programming

Features of R

• R is a programming language and software environment for statistical


analysis, graphics representation and reporting. The following are the
important features of R −
• R is a well-developed, simple and effective programming language which
includes conditionals, loops, user defined recursive functions and input and
output facilities.
• R has an effective data handling and storage facility,
• R provides a suite of operators for calculations on arrays, lists, vectors and
matrices.
• R provides a large, coherent and integrated collection of tools for data
analysis.
• R provides graphical facilities for data analysis and display either directly at
the computer or printing at the papers.
R Data Types, Arithmetic & Logical Operators

• Basic data types


• Variables
• Vectors
• Arithmetic Operators
• Logical Operators
Basic data types

• R Programming works with numerous data types, including

• Scalars
• Vectors (numerical, character, logical)
• Matrices
• Data frames
• Lists
Basics types
• 4.5 is a decimal value called numeric.
• 4 is a natural value called integers. Integers are also numeric.
• TRUE or FALSE is a Boolean value called logical.
• The value inside " " or ' ' are text (string). They are called characters.
• Complex no of Type COMPLEX
Variable Assignment
• The variables can be assigned values using leftward, rightward and
equal to operator.
Variables
*** add a value to the variable, use <- or = or ->.
• # First way to declare a variable: use the `<-`
• name_of_variable <- value
• # Second way to declare a variable: use the `=`
• name_of_variable = value

x<-2L
• # Print variable x
typeof(x)
• x <- 42
y<-5
•x typeof(y)
• typeof(x)
Numeric
v <- 23.5
print(class(v))
print(v)

v <- 23
print(class(v))
print(v)

INTEGER::
v = 2L
print(class(v))
print(v)
• # String
• y <- "R is Fantastic"
• class(y)

• # Boolean
• z <- TRUE
• class(z)
*************
Complex
v <- 2+5i
print(class(v))
****************
Character ('a' , '"good", "TRUE", '23.4')
v <- "RAM"
print(class(v))
print(v)
• y <- 10
•y
Variable Operation
• # We call x and y and apply a subtraction
• x-y
• The values of the variables can be printed
using print() or cat()function.
• The cat() function combines multiple items into a continuous print
output.
DATA TYPE
• var_x <- "Hello"
• cat("The class of var_x is ",class(var_x),"\n")

• var_x <- 34.5


• cat(" Now the class of var_x is ",class(var_x),"\n")

• var_x <- 27L


• cat(" Next the class of var_x becomes ",class(var_x),"\n")
Printing Multi Line

➢cat("The class of var_x is ",class(var_x),"\n“


,"class of ",class(var_x),"\n")

O/P The class of var_x is character class of character


Deleting Variables
• > x<-10
•>x
• [1] 10
• > rm(x)
•>x
• Error: object 'x' not found
•>
Arithmetic Operators
• # Modulo
• 28%%6
TRY BODMAS RULE
VECTORS
• # Assignment using equal operator.
• var1 = c(2,3,5,6)

• # Assignment using leftward operator.


• var2 <- c("learn","R")

• # Assignment using rightward operator.


• c(TRUE,1) -> var3

• print(var1)
• cat ("var1 is ", var1 ,"\n")
• cat ("var2 is ", var2 ,"\n")
• cat ("var3 is ", var3 ,"\n")
Adds two vectors
• v <- c( 2,5.5,6)
• t <- c(8, 3, 4)
• print(v+t)
TRY MULTIPLICATION/ DEVISION
%/%
• The result of division of first vector with second (quotient)
• v <- c( 2,5.5,6)
• t <- c(8, 3, 4)
• print(v%/%t)
Logical Operators

• The logical statements in R are wrapped inside the


[]. We can add many conditional statements as we
like but we need to include them in a parenthesis.
# Create a vector from 1 to 10
logical_vector <- c(1:10)
logical_vector>5
Relational Operators
v <- c(2,5.5,6,9)
• v <- c(2,5.5,6,9)
t <- c(8,2.5,14,9)
• t <- c(8,2.5,14,9)
print(v == t)
• print(v>t)
• ***********
• v <- c(2,5.5,6,9)
• t <- c(8,2.5,14,9)
• print(v < t)
• ************
• TRY:: <= , >=, !=
# Print value strictly above 5
logical_vector[(logical_vector>5)]
# Print 5 and 6 logical_vector <- c(1:10)
logical_vector[(logical_vector>4) & (logical_vector<7)]
Vectors
**A vector is a one-dimensional array.
• six types of atomic vectors. They are logical, integer, double, complex,
character and raw.
• # Numerical
• vec_num <- c(1, 10, 49)
• vec_num
• **** # Create a vector.
apple <- c('red','green',"yellow")
• # Character print(apple)
• vec_chr <- c("a", "b", "c")
# Get the class of the vector.
• vec_chr
print(class(apple))
• # Boolean
• vec_bool <- c(TRUE, FALSE, TRUE)
• vec_bool
• *****
Arithmetic calculations on vectors.
• # Create the vectors
• vect_1 <- c(1, 3, 5)
• vect_2 <- c(2, 4, 6)
• # Take the sum of A_vector and B_vector
• sum_vect <- vect_1 + vect_2
• # Print out total_vector
• sum_vect
Using sequence (Seq.) operator
• # Create vector with elements from 5 to 9 incrementing by 0.4.
• print(seq(5, 9, by = 0.4))
c() function
• The non-character values are coerced to character type if one of the
elements is a character.

• # The logical and numeric values are converted to characters.


• s <- c('apple','red',5,TRUE)
• print(s)
slice a vector x <- c('a','b','c','d','e')
> x[0]
• # Slice the first five rows of the vector
character(0)
• X_vector <- c(1,2,3,4,5,6,7,8,9,10) > x[1]
• X_ vector[1:5] [1] "a"
• [1] 1 2 3 4 5 > x[2]
• > X_ vector[3:5] [1] "b"
• [1] 3 4 5 > x[2:4]
• > X_ vector[-1] [1] "b" "c" "d"
• [1] 2 3 4 5 6 7 8 9 10 > x[-1]
• > X_ vector[-2] [1] "b" "c" "d" "e"
• [1] 1 3 4 5 6 7 8 9 10 > x[-2]
[1] "a" "c" "d" "e"
>
• X_ vector[0:5]
• [1] 1 2 3 4 5
• > X_ vector[1:5]
• [1] 1 2 3 4 5
• > X_ vector[-5]
• [1] 1 2 3 4 6 7 8 9 10
•>
Concatenating 2 vectors
• X<-c(1,2,3)
• Y<-c(4,5,6)
• T <-c(X,Y)
How to remove multi Elemnet
• a <- c (1 : 10)
• > remove <- c (2, 3, 5)
•>a
• [1] 10 5 2 7 1 6 3 4 8 9
• > a %in% remove
• [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
• > a [! (a %in% remove)]
> b <- c(1:5)
>b
[1] 1 2 3 4 5
> c <- (2,3)
Error: unexpected ',' in "c <- (2,"
> c <- c(2,3)
>c
[1] 2 3
> a [! a %in% c]
[1] 1 9 7 4 10 5 6 8
> b [! b %in% c]
[1] 1 4 5
> b [! (b %in% c)]
[1] 1 4 5
Append::
• append(x, values, after = length(x))
append(1:5, 0:1, after = 3)
[1] 1 2 3 0 1 4 5
x
the vector the values are to be appended to.
values
to be included in the modified vector.
after
a subscript, after which the values are to be appendex
the vector the values are to be appended to.
•b
• [1] 1 2 3 4 5
• > append(b, c(10:15), after = 5)
• [1] 1 2 3 4 5 10 11 12 13 14 15
• create a range of value is to use the: between two numbers. For
instance, from the above example, we can write c(1:10) to create a
vector of value from one to ten.

• # Faster way to create adjacent values


• c(1:10)
Accessing Vector element
Vector Manipulation
Vector Element Recycling
• v1 <- c(3,8,4,5,0,11)
• v2 <- c(4,11)
• # V2 becomes c(4,11,4,11,4,11)

• result <- v1+v2


• print(result)

• sub.result <- v1-v2


• print(sub.result)
List in R

• A list is a great tool to store many kinds of object in the order


expected. We can include matrices, vectors data frames or lists. We
can imagine a list as a bag in which we want to put many different
items. When we need to use an item, we open the bag and use it. A
list is similar; we can store a collection of objects and use them when
we need them.
Create a List

• list(element_1, ...)
• arguments:
• -element_1: store any type of R object
• -...: pass as many objects as specifying. each object needs to be
separated by a comma
• Create a Vector::
• # Vector with numeric from 1 up to 5
• vect <- 1:5
LIST CREATEION
# Create a list.
• list1 <- list(c(2,5,3),21.3,sin)

# Print the list.


print(list1)

list1 <- list(c(2,5,3),21.3,sin,21.7,TRUE)


> list1
Select Elements from List

• use the [[index]] to select an element in a list.


• # Print second element of the list
• my_list[[2]]
Array
• # Create an array. a <- array(c('green','yellow’))
➢a
➢ [1] "green" "yellow"

➢ > a <- array(c('green','yellow'),5)


➢ > a [1]
➢ "green" "yellow" "green" "yellow"
"green"

>
Concatenating Strings

• paste() function
• Many strings in R are combined using the paste() function. It can take
any number of arguments to be combined together.
• paste(..., sep = " ", collapse = NULL)
a <- "Hello"
b <- 'How'
c <- "are you? "

print(paste(a,b,c))

print(paste(a,b,c, sep = "-"))

print(paste(a,b,c, sep = "", collapse = ""))


nth <- paste(1:12, c("st", "nd", "rd"),sep = "_")
nth <- paste0(1:12, c("st", "nd", "rd"))
➢ nth
➢ [1] "1st" "2nd" "3rd" "4st" "5nd" "6rd" "7st" [8] "8nd"
"9rd" "10st" "11nd" "12rd"
➢ > nth <- paste(1:12, c("st", "nd", "rd"))
➢ > nth [1]
➢ "1 st" "2 nd" "3 rd" "4 st" "5 nd" "6 rd" [7] "7 st" "8
nd" "9 rd" "10 st" "11 nd" "12 rd"
➢ > nth <- paste(1:12, c("st", "nd", "rd"),sep = "_")
➢ > nth [1] "1_st" "2_nd" "3_rd" "4_st" "5_nd" "6_rd"
"7_st" [8] "8_nd" "9_rd" "10_st" "11_nd" "12_rd"

>
➢ > nth <- paste0(1:12, c("st", "nd", "rd", rep("th",
9)))
➢ > nth
➢ [1] "1st" "2nd" "3rd" "4th" "5th" "6th" "7th" "8th"
[9] "9th" "10th" "11th" "12th“
➢ > nth <- paste0(1:12, c("st", "nd", "rd", rep("th",
9)),collapse = "*")
➢ > nth [1]
"1st*2nd*3rd*4th*5th*6th*7th*8th*9th*10th*11th*12th"
➢ > nth <- paste0(1:12, c("st", "nd", "rd", rep("th",
9)),sep ="_", collapse = "*")
➢ > nth
➢ [1]
"1st_*2nd_*3rd_*4th_*5th_*6th_*7th_*8th_*9th_*10th_*11
th_*12th_"
Counting number of characters in a string - nchar()
function

result <- nchar("Count the number of characters") x<-c("manish","singh")


print(result) > toupper(x)
[1] "MANISH" "SINGH"
>
toupper(x)
tolower(x)

**************x is the vector input.**************


# Changing to Upper case.
result <- toupper("Changing To Upper")
print(result)

# Changing to lower case.


result <- tolower("Changing To Lower")
print(result)
substring(x,first,last)

# Extract characters from 5th to 7th position.


result <- substring("Extract", 5, 7)
print(result)
Matrix
• A matrix is a 2-dimensional array that has m number of rows and n
number of columns. In other words, matrix is a combination of two or
more vectors with the same data type.
• Note: It is possible to create more than two dimensions arrays with R.
How to Create a Matrix in R
matrix(). function takes three arguments:
matrix(data, nrow, ncol, byrow = FALSE)

Arguments:
•data: The collection of elements that R will arrange into the rows and columns of the matrix \
•nrow: Number of rows
•ncol: Number of columns
•byrow: The rows are filled from the left to the right. We use `byrow = FALSE` (default values), if
we want the matrix to be filled by the columns i.e. the values are filled top to bottom.

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
• M = matrix(c('a','b','c','d'),2,2,byrow=FALSE)
• >M
• [,1] [,2]
• [1,] "a" "c"
• [2,] "b" "d"
• > M = matrix(c('a','b','c','d'),2,2,byrow=TRUE)
• >M
• [,1] [,2]
• [1,] "a" "b"
• [2,] "c" "d"
Let's construct two 5x2 matrix with a sequence of number
from 1 to 10, one with byrow = TRUE and one with byrow =
FALSE to see the difference.
# Construct a matrix with 5 rows that contain the numbers
1 up to 10 and byrow = TRUE

matrix_a <-matrix(1:10, byrow = TRUE, nrow = 5)


matrix_a
Print dimension of the matrix with dim()

dim(matrix_a)
Construct a matrix with 5 rows that contain the numbers
1 up to 10 and byrow = FALSE
# Construct a matrix with 5 rows that contain the numbers 1 up to 10
and byrow = FALSE

matrix_b <-matrix(1:10, byrow = FALSE, nrow = 5)


matrix_b

# Print dimension of the matrix with dim()


dim(matrix_b)
Note: Using command
matrix_b <-matrix(1:10, byrow = FALSE, ncol = 2)
will have same effect as above.
You can also create a 4x3 matrix using ncol.
R will create 3 columns and fill the row from top to
bottom.
EX::
matrix_c <-matrix(1:12, byrow = FALSE, ncol = 3)
matrix_c

dim(matrix_c)
Add a Column to a Matrix with the cbind()
cbind() means column binding. cbind()can concatenate as
many matrix or columns as specified.

# concatenate c(1:5) to the matrix_a

matrix_a1 <- cbind(matrix_a, c(1:5))


# Check the dimension

dim(matrix_a1)

matrix_a1
Adding a Vector to a Matrix as a New Column
matrix_a2 <-matrix(13:24, byrow = FALSE, ncol = 3)

matrix_c <-matrix(1:12, byrow = FALSE, ncol = 3)


matrix_d <- cbind(matrix_a2, matrix_c)
dim(matrix_d)
Adding a Matrix with a Matrix as a New Columns
• matrix_x <-matrix(1:12, byrow = FALSE, ncol = 4)
• matrix_y <-matrix(1:12, byrow = FALSE, ncol = 3)
• > matrix_z <- cbind(matrix_c, matrix_d)
•>

NOTE: The number of rows of matrices should be equal


for cbind work
cbind()concatenate columns, rbind() appends
rows.
• Let's add one row to our matrix_c matrix and verify the dimension is
6x3
matrix_c <-matrix(1:12, byrow = FALSE, ncol = 3)
# Create a vector of 3 columns
add_row <- c(1:3)

# Append to the matrix


matrix_c <- rbind(matrix_b, add_row)
# Check the dimension
dim(matrix_c)
NOTE: The number of columns of matrices should be equal for rbind work
Adding a Matrix with a Matrix as a New Rows
• matrix_c <-matrix(1:12, byrow = FALSE, ncol = 4)
• matrix_d <-matrix(1:12, byrow = FALSE, ncol = 3)
• > matrix_c <- rbind(matrix_b, matrix_d)
•>

Error in rbind(matrix_b, matrix_d) :


number of columns of matrices must match (see arg 2)

NOTE: The number of columns of matrices should be equal for rbind work
Slice a Matrix

• matrix_c[1,2] selects the element at the first row and second column.
• matrix_c[1:3,2:3] results in a matrix with the data on the rows 1, 2, 3
and columns 2, 3,
• matrix_c[,1] selects all elements of the first column.
• matrix_c[1,] selects all elements of the first row.
matrix[,3]
matrix[,1:3]
matrix[2,]
matrix[1:3,]
Try with Minus Sign::::::::::::::::
FACTORS
• Factors are the r-objects which are created using a vector.
• It stores the vector along with the distinct values of the elements in
the vector as labels.
• The labels are always character Type
• Factors are created using the factor() function. The nlevels functions
gives the count of levels.
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))
# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))
Factor in R: Categorical & Continuous Variables

• Factors are variables in R which take on a limited number of different


values; such variables are often referred to as categorical variables.
• In a dataset, we can distinguish two types of
variables: categorical and continuous.
• categorical variable::: the value is limited and usually based on a
particular finite group. For example, a categorical variable can be
countries, year, gender, occupation.
• Continuous variable :: A continuous variable, however, can take any
values, from integer to decimal. For example, we can have the
revenue, price of a share, etc.
Categorical Variables

• R stores categorical variables into a factor.


• convert a character variable into a factor variable.
• Syntax:::
• factor(x = character(), levels, labels = levels, ordered = is.ordered(x)).
• Characters are not supported in machine learning algorithm, and the
only way is to convert a string to an integer.
Syntax:::
factor(x = character(), levels, labels = levels, ordered = is.ordered(x)).

• Arguments:
• x: A vector of data. Need to be a string or integer, not decimal.
• Levels: A vector of possible values taken by x. This argument is
optional. The default value is the unique list of items of the vector x.
• Labels: Add a label to the x data. For example, 1 can take the label
`male` while 0, the label `female`.
• ordered: Determine if the levels should be ordered.
Let's create a factor data frame.
• # Create gender vector
• gender_vector <- c("Male", "Female", "Female", "Male", "Male")
• class(gender_vector)
• # Convert gender_vector to a factor
• factor_gender_vector <-factor(gender_vector)
• class(factor_gender_vector)
• Output:::
• ## [1] "character"
• ## [1] "factor"
It is important to transform a string into factor
when we perform Machine Learning task.
categorical variable
• A categorical variable can be divided into nominal categorical
variable and ordinal categorical variable.
• Nominal Categorical Variable
• A categorical variable has several values but the order does not
matter. For instance, male or female categorical variable do not have
ordering.
• # Create a color vector
• color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
• # Convert the vector to factor
• factor_color <- factor(color_vector)
• factor_color

O/P: ## [1] blue red green white black yellow


## Levels: black blue green red white yellow

From the factor_color, we can't tell any order.


Ordinal Categorical Variable

• Ordinal categorical variables do have a natural ordering. We can specify the order,
from the lowest to the highest with order = TRUE and highest to lowest with
order = FALSE.
• Example:
• We can use summary to count the values for each factor.
• # Create Ordinal categorical vector
• day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
• # Convert `day_vector` to a factor with ordered level
• factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday',
'afternoon', 'evening', 'midnight'))
• # Print the new variable
• factor_day O/P::## [1] evening morning afternoon midday
midnight evening
• factor_day <- factor(day_vector, order = TRUE, levels =c(1:5))

• factor_day <- factor(day_vector, order = TRUE, levels =c(2,5,4,1,9))


• ## Levels: morning < midday < afternoon < evening < midnight
• # Append the line to above code
• # Count the number of occurence of each level
• summary(factor_day)
• O/P
• ## morning midday afternoon evening midnight
• ## 1 1 1 2 1

R ordered the level from 'morning' to 'midnight' as


specified in the levels parenthesis.
Continuous Variables

• Continuous class variables are the default value in R. They are stored
as numeric or integer. We can see it from the dataset below. mtcars is
a built-in dataset. It gathers information on different types of car. We
can import it by using mtcars and check the class of the variable mpg,
mile per gallon. It returns a numeric value, indicating a continuous
variable.
• Ex:::
• dataset <- mtcars
• class(dataset)
• O/P## [1] "numeric"
DataFrame: Create, Append, Select, Subset

• Data frames are tabular data objects.


• Unlike a matrix in data frame each column can contain different
modes of data. The first column can be numeric while the second
column can be character and third column can be logical.

• A data frame is a list of vectors which are of equal length.

• A matrix contains only one type of data, while a data frame accepts
different data types (numeric, character, factor, etc.).
characteristics of a data frame.

• The column names should be non-empty.


• The row names should be unique.
• The data stored in a data frame can be of numeric, factor or character
type.
• Each column should contain same number of data items.
Create a Data Frame

• We can create a data frame by passing the variable a,b,c,d into the
data.frame() function. We can name the columns with name() and
simply specify the name of the variables.

• data.frame(df, stringsAsFactors = TRUE)


• Arguments:
• df: It can be a matrix to convert as a data frame or a collection of
variables to join
• stringsAsFactors: Convert string to factor by default
use stringsAsFactors = FALSE.

• R often uses a concept of factors to re-encode strings. This can be too


early and too aggressive. Sometimes a string is just a string
• To avoid problems delay re-encoding of strings by using
stringsAsFactors = FALSE when creating data.frames.

• Example:

• d <- data.frame(label = rep("tbd", 5))


• d$label[[2]] <- "north“
• e <- data.frame(label = rep("tbd", 5),
•+ stringsAsFactors = FALSE)
•>e
• e$label[[2]] <- "north"
•>e
Create our first data set by combining four
variables of same length
• # Create a, b, c, d variables
• a <- c(10,20,30,40)
• b <- c('book', 'pen', 'textbook', 'pencil_case')
• c <- c(TRUE,FALSE,TRUE,FALSE)
• d <- c(2.5, 8, 10, 7)
• # Join the variables to create a data frame
• df <- data.frame(a,b,c,d)
• df
age<-c(10,20,30,40)
> item <- c('book', 'pen', 'textbook', 'pencil_case')
> price <- d <- c(2.5, 8, 10, 7)
> df <- data.frame(age,item,price)
> df
age item price
1 10 book 2.5
2 20 pen 8.0
3 30 textbook 10.0
4 40 pencil_case 7.0
•>
We can change the column name with the
function names().
• # Name the data frame
• names(df) <- c('ID', 'items', 'store', 'price')
• df
• # Print the structure
• str(df)

• By default, data frame returns string variables as a factor.


• # Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
• print(BMI)
• # Create the data frame.
• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
• "2015-03-27")),
• stringsAsFactors = FALSE
• )
• # Print the data frame.
• print(emp.data)

# Get the structure of the data frame.


str(emp.data)
Summary of Data in Data Frame

• # Print the summary.


• print(summary(emp.data))
Slice Data Frame

• A data frame is composed of rows and columns, df[A, B]. A represents


the rows and B the columns. We can slice either by specifying the
rows and/or columns.

## Select row 1 in column 2


df[1,2]

## Select Rows 1 to 2
df[1:2,]

## Select Columns 1
df[,1]
## Select Rows 1 to 3 and columns 3 to 4
df[1:3, 3:4]
# Slice with columns name
• df[, c('ID', 'store')]
# Extract Specific columns.

• # Create the data frame.


• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
• "2015-03-27")),
• stringsAsFactors = FALSE
• )
• # Extract Specific columns.
• result <- data.frame(emp.data$emp_name,emp.data$salary)
• print(result)
Append a Column to Data Frame
symbol $ to append a new variable.
• # Create a new vector
• quantity <- c(10, 35, 40, 5)

• # Add `quantity` to the `df` data frame


• df$quantity <- quantity
• df
Add Column
• # Create the data frame.
• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
• "2015-03-27")),
• stringsAsFactors = FALSE
• )

• # Add the "dept" coulmn.


• emp.data$dept <- c("IT","Operations","IT","HR","Finance")
• v <- emp.data
• print(v)
Add Row
• # Create the first data frame.
• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
• "2015-03-27")),
• dept = c("IT","Operations","IT","HR","Finance"),
• stringsAsFactors = FALSE
• )
• # Create the second data frame
• emp.newdata <- data.frame(
• emp_id = c (6:8),
• emp_name = c("Rasmi","Pranab","Tusar"),
• salary = c(578.0,722.5,632.8),
• start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
• dept = c("IT","Operations","Fianance"),
• stringsAsFactors = FALSE
•)
# Bind the two data frames.
• emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)
QUIZ:: Executing the following statement
• quantity <- c(10, 35, 40)
• # Add `quantity` to the `df` data frame
• df$quantity <- quantity
• Note: The number of elements in the vector has to be equal to the no
of elements in data frame.

• Error in `$<-.data.frame`(`*tmp*`, quantity, value = c(10, 35, 40))


• replacement has 3 rows, data has 4
Select a Column of a Data Frame

• # Select the column ID


• df$ID
Subset a Data Frame
subset based on whether or not a certain condition was true.
• subset(x, condition)
• arguments:
• - x: data frame used to perform the subset
• - condition: define the conditional statement

• Ex:::
• # Select price above 5
• subset(df, subset = price > 5)
SORTING -DATAFRAME
• Arranging the data can be of ascending or descending order.

• SYNTAX::
• sort(x, decreasing = FALSE, na.last = TRUE):

• Argument:
• x: A vector containing continuous or factor variable
• decreasing: Control for the order of the sort method. By default,
decreasing is set to `FALSE`.
• last: Indicates whether the `NA` 's value should be put last or not
DATAFRAME CEATION→ tibble
• set.seed(1234)
• data_frame <- tibble(
• c1 = rnorm(50, 5, 1.5),
• c2 = rnorm(50, 5, 1.5),
• c3 = rnorm(50, 5, 1.5),
• c4 = rnorm(50, 5, 1.5),
• c5 = rnorm(50, 5, 1.5)
• )
• # Sort by c1
• df <-data_frame[order(data_frame$c1),]
• head(df)
• # Sort by c3 and c4
• df <-data_frame[order(data_frame$c3, data_frame$c4),]
• head(df)
• # Sort by c3(descending) and c4(acending)
• df <-data_frame[order(-data_frame$c3, data_frame$c4),]
• head(df)

You might also like