Basic Features of The R Programming Language - Special Objects

Basic features of the R
programming language: special

objects
Recursos Computacionais aplicados à Estatística e Experimentação
Agropecuária - PPG em Estatística e Experimentação Agropecuária
Prof. Izabela R. Cardoso de Oliveira

2023/1
Vector
Types
· Basic data storage unit in R. It is an one-dimensional arrays (line or
column) of entries.
· It is a collection of one or more entries of the same class (type)
· Four major vector classes (Logan, 2010):
Interger (whole numbers)
2:4 # vector of integers from 2 to 4
[1] 2 3 4
c(1,3,9) # vector which contain three integer values
[1] 1 3 9
3/64
Numeric (real numbers)
c(8.4, 2.1) # vector which contains two real numbers
[1] 8.4 2.1
Character (letters)
c('A', 'ABC') # vector which contains letters
[1] "A" "ABC"
Logical (TRUE or FALSE)
c(2:4)==3 # evaluate the expression
[1] FALSE TRUE FALSE
4/64
Sequences
· Inclusive sequences of integers can be generated using the : operator
10:18 # a sequence from 10 to 18 inclusive
[1] 10 11 12 13 14 15 16 17 18
· The seq() function is used to generate numeric sequences
seq(from=2, to=20, by=4) # every 4th number from 2 to <= 20
[1] 2 6 10 14 18
seq(from = 2, to = 20, length = 5) # 5 numbers from 2 to 20 inclusive
[1] 2.0 6.5 11.0 15.5 20.0
5/64
· Sequences of repeated entries: the rep() function
rep(4, 5) # repeat number 4 five times
[1] 4 4 4 4 4
rep("no", 4) # repeat the word 'no' four times
[1] "no" "no" "no" "no"
6/64
rep(c(2, 5), 3) # repeat the series 2 & 5 three times
[1] 2 5 2 5 2 5
rep(c(2, 5), c(3, 2)) # '2' three times and '5' twice
[1] 2 2 2 5 5
In the two last examples, there are functions within functions. In that case, the
inner most function is evaluated first.
7/64
Character vectors
LETTERS[1:5] # vector which contains uppercase letters from 'A' to 'E'
[1] "A" "B" "C" "D" "E"
letters[1:5] # vector which contains lowercase letters from 'a' to 'e'
[1] "a" "b" "c" "d" "e"
months1 <- c("January", "February", "March", "April",

"May", "June", "July", "August",
"September", "October", "November", "December")
months1
[1] "January" "February" "March" "April" "May" "June"

[7] "July" "August" "September" "October" "November" "December"
8/64
months2 <- c("January/2020", "February/2020", "March/2020",
"April/2020", "May/2020", "June/2020",
"July/2020", "August/2020", "September/2020",
"October/2020", "November/2020", "December/2020")
months2
[1] "January/2020" "February/2020" "March/2020" "April/2020"

[5] "May/2020" "June/2020" "July/2020" "August/2020"
[9] "September/2020" "October/2020" "November/2020" "December/2020"
· paste() function: a more elegant way to generate the above character

vector
months3<- paste(months1, "2020", sep = "/")

months3
[1] "January/2020" "February/2020" "March/2020" "April/2020"

[5] "May/2020" "June/2020" "July/2020" "August/2020"
[9] "September/2020" "October/2020" "November/2020" "December/2020"
9/64
· A character vector can be used to name the elements of another vector
branch.length <- c(0.600, 0.660, 0.663, 0.721,

0.742, 0.790, 0.805, 0.852,
0.865, 0.870, 0.870, 0.877)
branch.length
[1] 0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852 0.865 0.870 0.870 0.877
names(branch.length) <- months1

branch.length
January February March April May June July August

0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
September October November December
0.865 0.870 0.870 0.877
10/64
· substr() function: it is used to extract parts of a string (set of characters)
substr(months1, 1, 3)
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
11/64
Factor
· factor() function: converts a vector into a factor vector (an additional
class of vector) and to properly accommodate categorical variables
gender1 <- c("F", "F", "F", "F", "M", "M", "M","M","M")

gender2<- factor(gender1)
gender1
[1] "F" "F" "F" "F" "M" "M" "M" "M" "M"
gender2
[1] F F F F M M M M M
Levels: F M
12/64
· gl() command: it can be used to generate a factor vector when each level
of the factor has an equal number of entries (replicates)
gender3<- gl(2, 5, 10, c("F","M"))

gender3
[1] F F F F F M M M M M
Levels: F M
13/64
Vector indexing
· We can use [ ] to extract a subset of vectors. It can be do by using
different forms (Logan, 2010). We will use the vector branch.length, that
was created earlier.
branch.length

0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
0.865 0.870 0.870 0.877
14/64
Vector of positive integers
branch.length[3] # select the third element
March
0.663
branch.length[2:5] # select element 2 through 5
February March April May

0.660 0.663 0.721 0.742
branch.length[c(1,3,5)] # select a specific set of elements
January March May

0.600 0.663 0.742
15/64
Vector of negative integers
branch.length[-2] # select all but the second element
January March April May June July August September

0.600 0.663 0.721 0.742 0.790 0.805 0.852 0.865
October November December
0.870 0.870 0.877
branch.length[-c(1:6)] # select all but an specific set of elements
July August September October November December

0.805 0.852 0.865 0.870 0.870 0.877
16/64
Vector of character strings: it is necessary that the vector elements have
been named
branch.length[c("December")] # select the named element
December
0.877
branch.length[c("September","October")] # select the names elements
September October
0.865 0.870
17/64
Vector of logical values
Just to remember:
branch.length

0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
0.865 0.870 0.870 0.877
18/64
branch.length[branch.length > 0.8] # logical condition is true
July August September October November December

0.805 0.852 0.865 0.870 0.870 0.877
branch.length[branch.length > 0.7 & branch.length < 0.8] # multiple logical condition is true
April May June

0.721 0.742 0.790
19/64
Matrices
Matrices
· Vector: single dimension - it has length
· Matrix: two dimensional array - it has height and width
· matrix() command: a vector can be converted into a matrix by specifying

the number of rows (or columns) within matrix()
x <- 1:12
xmat1 <- matrix(x, ncol=4)
xmat1
[,1] [,2] [,3] [,4]

[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
21/64
xmat2 <- matrix(x, nrow=3)
xmat2
[,1] [,2] [,3] [,4]

[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
· By default, the matrix is filled by columns. We can use the argument

byrow=TRUE to fill by rows
xmat3 <- matrix(x, ncol=4, byrow=TRUE)

xmat3
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
22/64
· Matrix is used to store vectors with the same type and size
· Combining vectors: cbind() (combine by columns) and rbind() (combine

by rows)
X <- c(16.92, 24.03, 7.61, 15.49, 11.77)

Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
XY <- cbind(X, Y) # to combine columns
XY
X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 7.61 16.65
[4,] 15.49 12.20
[5,] 11.77 13.12
23/64
rbind(X, Y) # to combine rows
[,1] [,2] [,3] [,4] [,5]

X 16.92 24.03 7.61 15.49 11.77
Y 8.37 12.93 16.65 12.20 13.12
· Matrix with a single value
matrix(0,nrow=4,ncol=3)
[,1] [,2] [,3]

[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
24/64
· Dimensions of a matrix
xmat3
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
dim(xmat3)
[1] 3 4
25/64
· Setting the rows and columns names: rownames() and colnames()
commands
xmat3
[,1] [,2] [,3] [,4]

[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
rownames(xmat3)<- paste("R", 1:3, sep="")

colnames(xmat3)<- paste("C", 1:4, sep="")
xmat3
C1 C2 C3 C4
R1 1 2 3 4
R2 5 6 7 8
R3 9 10 11 12
26/64
Matrices indexing
· Like vectors, matrices can be indexed from vectors of positive integers,
negative integers, character strings and logical values. However, matrices
have two dimensions (height and width): matrix indexing takes on the
form of [row.indices, col.indices],
27/64
XY
X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 7.61 16.65
[4,] 15.49 12.20
[5,] 11.77 13.12
XY[3,2] # select the element at row 3, column 2
Y
16.65
XY[,2] # select the entire 2nd column
[1] 8.37 12.93 16.65 12.20 13.12
28/64
XY[3,] # select the entire 3rd row
X Y
7.61 16.65
XY[c(1,3,5),] # select entire rows 1, 3 and 5
X Y
[1,] 16.92 8.37
[2,] 7.61 16.65
[3,] 11.77 13.12
XY[c(1,3),2] # select the elements at rows 1 and 3, column 2
[1] 8.37 16.65
29/64
XY[, -2] # select all columns except the 2nd
[1] 16.92 24.03 7.61 15.49 11.77
XY[1:2, "X"] # select rows 1 through 2 for column X
[1] 16.92 24.03
# select all rows for which the

# value of the column X is greater than 12
XY[XY[, "X"] > 12, ]
X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 15.49 12.20
30/64
Lists
Constructing lists
· lists are used to store collections of objects that can be of different lengths
and types
· Constructed using the command list()
list1 <- list(Age=32, Name = "Aline", Grades = c(98,85,96))

list1
$Age
[1] 32
$Name
[1] "Aline"
$Grades
[1] 98 85 96
32/64
Objects of a list
· An object within a list can be referred to by appending a string character
($) followed by the name of the object to the list names or
list_name$object_name
list1$Grades # select the object named 'Grades' within the list
[1] 98 85 96
33/64
· An object or object elements within a list can also be referred to by
appending an index vector (enclosed in double square brackets, [[]])
list1[[3]] # select the third object in the list
[1] 98 85 96
# select the object named 'Grades' within the list

list1[['Grades']]
[1] 98 85 96
34/64
Data frames
What are Data frames?
· Similar to the matrices (rows and columns, two dimensional), but different
columns can stored different types of vectors. However, the vectors must
have the same size or length
· Data sets are accommodate in R by data frames, in which each row

corresponds to an observation and each column corresponds to a variable
36/64
· Example (Mello e Peternelli, 2013):
Name <- c("José", "Angela", "Aline", "Mayara", "Lara", "Nicolas")

Age <- c(17,17,16,15,15,13)
Gender <- factor(c("M","F","F","F", "F", "M"))
Grades <- c(92,75,81,87,90,88)
Name
[1] "José" "Angela" "Aline" "Mayara" "Lara" "Nicolas"
Age
[1] 17 17 16 15 15 13
Gender
[1] M F F F F M
Levels: F M
37/64
Grades
[1] 92 75 81 87 90 88
data1 <- data.frame(Name, Age, Gender, Grades)

data1
Name Age Gender Grades

1 José 17 M 92
2 Angela 17 F 75
3 Aline 16 F 81
4 Mayara 15 F 87
5 Lara 15 F 90
6 Nicolas 13 M 88
38/64
Importing (reading) data
· There are a large number of competing methods that can be used to
import data and from a wide variety of sources
· We will present the simplest methods of importing data from the most
popular sources
· The easiest form of importation is from a pure text file
· The most common text file are both comma ou semicolon delimited and
tab delimited
· In both cases we use the read.table() command
· Please, look the Excel file example1_phenological_data.xlsx, that is

available to download from Campus Virtual
39/64
· To read this data file, it is necessary to save
example1_phenological_data.xlsx from Excel to a text file
· After that, you can use the read.table() command
· Don’t forget to change the working directory, if necessary. To do this, use

the setwd() command
· To read a semicolon delimited text file, you can use the commands:
setwd("C:/Users/Izabela/Desktop/Aulas 2022 01/PG Data vis and manip/Slides/Slides")

data2 <- read.table("example1_phenological_data.csv", #
head=T, sep=";", dec=",")
40/64
data2 <- read.table("example1_phenological_data.csv", #
head=T, sep=";", dec=",")
str(data2)
'data.frame': 36 obs. of 4 variables:

$ month : int 1 1 1 1 1 1 1 1 1 1 ...
$ plant : int 1 1 1 1 2 2 2 2 3 3 ...
$ branch : int 1 2 3 4 1 2 3 4 1 2 ...
$ branch.length: num 0.52 0.79 0.6 0.6 0.685 0.585 0.485 1.13 0.5 0.69 ...
head(data2)
month plant branch branch.length

1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585
41/64
· To read a tab delimited text file, you can use the commands:
data3 <- read.table("example1_phenological_data.txt", #

head=T, sep="\t", dec=",")
head(data3)

1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585
42/64
# You can omit the 'sep="\t"' argument
# and just use the command
data4 <- read.table("example1_phenological_data.txt", #
head=T, dec=",")
head(data4)

1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585
43/64
Reviewing a data frame
· fix() function: it is used to view a data frame as a simple spreadsheet in a
separate window
fix(data1) # type this command on RStudio

fix(data2) # type this command on RStudio
44/64
Indexing data frames
· A vector or vector elements within a data frame can be referred to by
appending an index vector (enclosed in square brackets, [ ]) or by using
data_frame_name$column_name
# select the element at observation 2, variable 1

data1[2,1]
[1] "Angela"
# select the entire 2nd observation (all variables)

data1[2,]

2 Angela 17 F 75
45/64
data1$Name # select the entire variable 'Name'
[1] "José" "Angela" "Aline" "Mayara" "Lara" "Nicolas"
# select the 2nd element of the variable 'Name'

data1$Name[2]
[1] "Angela"
# select the elements 1 through 3 of the variable 'Name'

data1$Name[1:3]
[1] "José" "Angela" "Aline"
46/64
· Indexing by conditions
# select the observations that have Grades greater

# or equal to 90
data1[data1$Grades >= 90,]

1 José 17 M 92
5 Lara 15 F 90
# select the observations for "F" that have Grades greater

# or equal to 90
data1[data1$Grades >= 90 & data1$Gender=="F",]

5 Lara 15 F 90
47/64
Sorting datasets
· order()function: to sort datasets according to one or more variables
data1

1 José 17 M 92
2 Angela 17 F 75
3 Aline 16 F 81
4 Mayara 15 F 87
5 Lara 15 F 90
6 Nicolas 13 M 88
48/64
data1[order(data1$Gender, data1$Name),]

3 Aline 16 F 81
2 Angela 17 F 75
5 Lara 15 F 90
4 Mayara 15 F 87
1 José 17 M 92
6 Nicolas 13 M 88
49/64
Manipulation of data frames
· Commands cbind() and rbind() can be used for data frames
data1 <- cbind(data1, Grade=c("A","C", "B", "B","A", "B"))

data1
Name Age Gender Grades Grade

1 José 17 M 92 A
2 Angela 17 F 75 C
3 Aline 16 F 81 B
4 Mayara 15 F 87 B
5 Lara 15 F 90 A
6 Nicolas 13 M 88 B
· It can be also used data1$Grade <- c("A","C", "B", "B","A", "B")
50/64
data1$Grade2 <- c("A","C", "B", "B","A", "B")
data1
Name Age Gender Grades Grade Grade2

1 José 17 M 92 A A
2 Angela 17 F 75 C C
3 Aline 16 F 81 B B
4 Mayara 15 F 87 B B
5 Lara 15 F 90 A A
6 Nicolas 13 M 88 B B
51/64
· To split a data.frame by groups: command split()
split(data1, Gender)
$F
2 Angela 17 F 75 C C
3 Aline 16 F 81 B B
4 Mayara 15 F 87 B B
5 Lara 15 F 90 A A
$M
1 José 17 M 92 A A
6 Nicolas 13 M 88 B B
52/64
Object information and
conversion
Object’s attributes
· All R objects are of a certain type or class
name <- "Renato"

class(name)
[1] "character"
class(Age)
[1] "numeric"
class(Gender)
[1] "factor"
54/64
· Family of functions prefixed with is.: to evaluate whether or not an object
is of a particular class
Examples: is.numeric( ), is.null( ), is.character( ), is.vector( ),

is.factor( ), is.matrix( ), is.data.frame( )
is.data.frame(Age)
[1] FALSE
is.data.frame(data1)
[1] TRUE
is.numeric(Age)
[1] TRUE
55/64
· Size or length of an object:
y <- seq(1,20, by=0.5)

length(y)
[1] 39
56/64
· Other characteristics of an object can be view by using str()
data2 <- c(100, 45, 55)

names(data2) <- c("Student", "Male", "Female")
data2
Student Male Female

100 45 55
str(data2)
Named num [1:3] 100 45 55

- attr(*, "names")= chr [1:3] "Student" "Male" "Female"
57/64
str(data1)

$ Name : chr "José" "Angela" "Aline" "Mayara" ...
$ Age : num 17 17 16 15 15 13
$ Gender: Factor w/ 2 levels "F","M": 2 1 1 1 1 2
$ Grades: num 92 75 81 87 90 88
$ Grade : chr "A" "C" "B" "B" ...
$ Grade2: chr "A" "C" "B" "B" ...
58/64
· Command attributes(): to access object’s attributes
attributes(data2)
$names
[1] "Student" "Male" "Female"
attributes(XY) # XY is a 5x2 matrix with columns names 'X' and 'Y'
$dim
[1] 5 2
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "X" "Y"
59/64
attributes(data1)
$names
[1] "Name" "Age" "Gender" "Grades" "Grade" "Grade2"
$row.names
[1] 1 2 3 4 5 6
$class
[1] "data.frame"
60/64
Object conversion
· Objects can be converted into other objects using a family of functions
with a as. prefix
Examples: as.numeric(), as.character(), as.vector(), as.factor(),

as.matrix(), as.data.frame()
61/64
· To convert a matrix into a vector
x1 <- matrix(1:12, ncol=4)

x1
[,1] [,2] [,3] [,4]

[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
as.vector(x1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12
62/64
str(data1)

$ Age : num 17 17 16 15 15 13
$ Grades: num 92 75 81 87 90 88
$ Grade : chr "A" "C" "B" "B" ...
$ Grade2: chr "A" "C" "B" "B" ...
data1$Grade<- as.factor(data1$Grade)
str(data1)

$ Age : num 17 17 16 15 15 13
$ Grades: num 92 75 81 87 90 88
$ Grade : Factor w/ 3 levels "A","B","C": 1 3 2 2 1 2
$ Grade2: chr "A" "C" "B" "B" ...
63/64
References
LOGAN, M. (2010) Biostatistical Design and Analysis Using R: A Practical Guide.
Hoboken, NJ: Wiley-Blackwell.
MELLO, M. P.; PETERNELLI L. A. (2013) Conhecendo R: uma visão mais que

estatística. Viçosa, MG: Editora UFV.
64/64

Basic Features of The R Programming Language - Special Objects

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Features of The R Programming Language - Special Objects

Uploaded by

Copyright:

Available Formats

Basic features of the R

programming language: special

Prof. Izabela R. Cardoso de Oliveira

· It is a collection of one or more entries of the same class (type)

· Four major vector classes (Logan, 2010):

Interger (whole numbers)

2:4 # vector of integers from 2 to 4

c(1,3,9) # vector which contain three integer values

c(8.4, 2.1) # vector which contains two real numbers

[1] 8.4 2.1

c('A', 'ABC') # vector which contains letters

[1] "A" "ABC"

Logical (TRUE or FALSE)

c(2:4)==3 # evaluate the expression

[1] FALSE TRUE FALSE

10:18 # a sequence from 10 to 18 inclusive

· The seq() function is used to generate numeric sequences

seq(from=2, to=20, by=4) # every 4th number from 2 to <= 20

seq(from = 2, to = 20, length = 5) # 5 numbers from 2 to 20 inclusive

[1] 2.0 6.5 11.0 15.5 20.0

rep(4, 5) # repeat number 4 five times

rep("no", 4) # repeat the word 'no' four times

[1] "no" "no" "no" "no"

LETTERS[1:5] # vector which contains uppercase letters from 'A' to 'E'

[1] "A" "B" "C" "D" "E"

letters[1:5] # vector which contains lowercase letters from 'a' to 'e'

[1] "a" "b" "c" "d" "e"

months1 <- c("January", "February", "March", "April",

[1] "January" "February" "March" "April" "May" "June"

[1] "January/2020" "February/2020" "March/2020" "April/2020"

· paste() function: a more elegant way to generate the above character

months3<- paste(months1, "2020", sep = "/")

[1] "January/2020" "February/2020" "March/2020" "April/2020"

branch.length <- c(0.600, 0.660, 0.663, 0.721,

names(branch.length) <- months1

January February March April May June July August

gender1 <- c("F", "F", "F", "F", "M", "M", "M","M","M")

gender3<- gl(2, 5, 10, c("F","M"))

January February March April May June July August

branch.length[3] # select the third element

branch.length[2:5] # select element 2 through 5

February March April May

branch.length[c(1,3,5)] # select a specific set of elements

January March May

branch.length[-2] # select all but the second element

January March April May June July August September

branch.length[-c(1:6)] # select all but an specific set of elements

July August September October November December

branch.length[c("December")] # select the named element

branch.length[c("September","October")] # select the names elements

January February March April May June July August

July August September October November December

April May June

· Matrix: two dimensional array - it has height and width

· matrix() command: a vector can be converted into a matrix by specifying

[,1] [,2] [,3] [,4]

[,1] [,2] [,3] [,4]

· By default, the matrix is filled by columns. We can use the argument

xmat3 <- matrix(x, ncol=4, byrow=TRUE)

[,1] [,2] [,3] [,4]

· Combining vectors: cbind() (combine by columns) and rbind() (combine

X <- c(16.92, 24.03, 7.61, 15.49, 11.77)