You are on page 1of 64

Basic features of the R

programming language: special


objects
Recursos Computacionais aplicados à Estatística e Experimentação
Agropecuária - PPG em Estatística e Experimentação Agropecuária

Prof. Izabela R. Cardoso de Oliveira


2023/1
Vector
Types
· Basic data storage unit in R. It is an one-dimensional arrays (line or
column) of entries.

· It is a collection of one or more entries of the same class (type)

· Four major vector classes (Logan, 2010):

Interger (whole numbers)

2:4 # vector of integers from 2 to 4

[1] 2 3 4

c(1,3,9) # vector which contain three integer values

[1] 1 3 9

3/64
Numeric (real numbers)

c(8.4, 2.1) # vector which contains two real numbers

[1] 8.4 2.1

Character (letters)

c('A', 'ABC') # vector which contains letters

[1] "A" "ABC"

Logical (TRUE or FALSE)

c(2:4)==3 # evaluate the expression

[1] FALSE TRUE FALSE

4/64
Sequences
· Inclusive sequences of integers can be generated using the : operator

10:18 # a sequence from 10 to 18 inclusive

[1] 10 11 12 13 14 15 16 17 18

· The seq() function is used to generate numeric sequences

seq(from=2, to=20, by=4) # every 4th number from 2 to <= 20

[1] 2 6 10 14 18

seq(from = 2, to = 20, length = 5) # 5 numbers from 2 to 20 inclusive

[1] 2.0 6.5 11.0 15.5 20.0

5/64
· Sequences of repeated entries: the rep() function

rep(4, 5) # repeat number 4 five times

[1] 4 4 4 4 4

rep("no", 4) # repeat the word 'no' four times

[1] "no" "no" "no" "no"

6/64
rep(c(2, 5), 3) # repeat the series 2 & 5 three times

[1] 2 5 2 5 2 5

rep(c(2, 5), c(3, 2)) # '2' three times and '5' twice

[1] 2 2 2 5 5

In the two last examples, there are functions within functions. In that case, the
inner most function is evaluated first.

7/64
Character vectors

LETTERS[1:5] # vector which contains uppercase letters from 'A' to 'E'

[1] "A" "B" "C" "D" "E"

letters[1:5] # vector which contains lowercase letters from 'a' to 'e'

[1] "a" "b" "c" "d" "e"

months1 <- c("January", "February", "March", "April",


"May", "June", "July", "August",
"September", "October", "November", "December")
months1

[1] "January" "February" "March" "April" "May" "June"


[7] "July" "August" "September" "October" "November" "December"

8/64
months2 <- c("January/2020", "February/2020", "March/2020",
"April/2020", "May/2020", "June/2020",
"July/2020", "August/2020", "September/2020",
"October/2020", "November/2020", "December/2020")
months2

[1] "January/2020" "February/2020" "March/2020" "April/2020"


[5] "May/2020" "June/2020" "July/2020" "August/2020"
[9] "September/2020" "October/2020" "November/2020" "December/2020"

· paste() function: a more elegant way to generate the above character


vector

months3<- paste(months1, "2020", sep = "/")


months3

[1] "January/2020" "February/2020" "March/2020" "April/2020"


[5] "May/2020" "June/2020" "July/2020" "August/2020"
[9] "September/2020" "October/2020" "November/2020" "December/2020"

9/64
· A character vector can be used to name the elements of another vector

branch.length <- c(0.600, 0.660, 0.663, 0.721,


0.742, 0.790, 0.805, 0.852,
0.865, 0.870, 0.870, 0.877)
branch.length

[1] 0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852 0.865 0.870 0.870 0.877

names(branch.length) <- months1


branch.length

January February March April May June July August


0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
September October November December
0.865 0.870 0.870 0.877

10/64
· substr() function: it is used to extract parts of a string (set of characters)

substr(months1, 1, 3)

[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

11/64
Factor
· factor() function: converts a vector into a factor vector (an additional
class of vector) and to properly accommodate categorical variables

gender1 <- c("F", "F", "F", "F", "M", "M", "M","M","M")


gender2<- factor(gender1)
gender1

[1] "F" "F" "F" "F" "M" "M" "M" "M" "M"

gender2

[1] F F F F M M M M M
Levels: F M

12/64
· gl() command: it can be used to generate a factor vector when each level
of the factor has an equal number of entries (replicates)

gender3<- gl(2, 5, 10, c("F","M"))


gender3

[1] F F F F F M M M M M
Levels: F M

13/64
Vector indexing
· We can use [ ] to extract a subset of vectors. It can be do by using
different forms (Logan, 2010). We will use the vector branch.length, that
was created earlier.

branch.length

January February March April May June July August


0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
September October November December
0.865 0.870 0.870 0.877

14/64
Vector of positive integers

branch.length[3] # select the third element

March
0.663

branch.length[2:5] # select element 2 through 5

February March April May


0.660 0.663 0.721 0.742

branch.length[c(1,3,5)] # select a specific set of elements

January March May


0.600 0.663 0.742

15/64
Vector of negative integers

branch.length[-2] # select all but the second element

January March April May June July August September


0.600 0.663 0.721 0.742 0.790 0.805 0.852 0.865
October November December
0.870 0.870 0.877

branch.length[-c(1:6)] # select all but an specific set of elements

July August September October November December


0.805 0.852 0.865 0.870 0.870 0.877

16/64
Vector of character strings: it is necessary that the vector elements have
been named

branch.length[c("December")] # select the named element

December
0.877

branch.length[c("September","October")] # select the names elements

September October
0.865 0.870

17/64
Vector of logical values

Just to remember:

branch.length

January February March April May June July August


0.600 0.660 0.663 0.721 0.742 0.790 0.805 0.852
September October November December
0.865 0.870 0.870 0.877

18/64
branch.length[branch.length > 0.8] # logical condition is true

July August September October November December


0.805 0.852 0.865 0.870 0.870 0.877

branch.length[branch.length > 0.7 & branch.length < 0.8] # multiple logical condition is true

April May June


0.721 0.742 0.790

19/64
Matrices
Matrices
· Vector: single dimension - it has length

· Matrix: two dimensional array - it has height and width

· matrix() command: a vector can be converted into a matrix by specifying


the number of rows (or columns) within matrix()

x <- 1:12
xmat1 <- matrix(x, ncol=4)
xmat1

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

21/64
xmat2 <- matrix(x, nrow=3)
xmat2

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

· By default, the matrix is filled by columns. We can use the argument


byrow=TRUE to fill by rows

xmat3 <- matrix(x, ncol=4, byrow=TRUE)


xmat3

[,1] [,2] [,3] [,4]


[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

22/64
· Matrix is used to store vectors with the same type and size

· Combining vectors: cbind() (combine by columns) and rbind() (combine


by rows)

X <- c(16.92, 24.03, 7.61, 15.49, 11.77)


Y <- c(8.37, 12.93, 16.65, 12.2, 13.12)
XY <- cbind(X, Y) # to combine columns
XY

X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 7.61 16.65
[4,] 15.49 12.20
[5,] 11.77 13.12

23/64
rbind(X, Y) # to combine rows

[,1] [,2] [,3] [,4] [,5]


X 16.92 24.03 7.61 15.49 11.77
Y 8.37 12.93 16.65 12.20 13.12

· Matrix with a single value

matrix(0,nrow=4,ncol=3)

[,1] [,2] [,3]


[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0

24/64
· Dimensions of a matrix

xmat3

[,1] [,2] [,3] [,4]


[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

dim(xmat3)

[1] 3 4

25/64
· Setting the rows and columns names: rownames() and colnames()
commands

xmat3

[,1] [,2] [,3] [,4]


[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

rownames(xmat3)<- paste("R", 1:3, sep="")


colnames(xmat3)<- paste("C", 1:4, sep="")
xmat3

C1 C2 C3 C4
R1 1 2 3 4
R2 5 6 7 8
R3 9 10 11 12

26/64
Matrices indexing
· Like vectors, matrices can be indexed from vectors of positive integers,
negative integers, character strings and logical values. However, matrices
have two dimensions (height and width): matrix indexing takes on the
form of [row.indices, col.indices],

27/64
XY

X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 7.61 16.65
[4,] 15.49 12.20
[5,] 11.77 13.12

XY[3,2] # select the element at row 3, column 2

Y
16.65

XY[,2] # select the entire 2nd column

[1] 8.37 12.93 16.65 12.20 13.12

28/64
XY[3,] # select the entire 3rd row

X Y
7.61 16.65

XY[c(1,3,5),] # select entire rows 1, 3 and 5

X Y
[1,] 16.92 8.37
[2,] 7.61 16.65
[3,] 11.77 13.12

XY[c(1,3),2] # select the elements at rows 1 and 3, column 2

[1] 8.37 16.65

29/64
XY[, -2] # select all columns except the 2nd

[1] 16.92 24.03 7.61 15.49 11.77

XY[1:2, "X"] # select rows 1 through 2 for column X

[1] 16.92 24.03

# select all rows for which the


# value of the column X is greater than 12
XY[XY[, "X"] > 12, ]

X Y
[1,] 16.92 8.37
[2,] 24.03 12.93
[3,] 15.49 12.20

30/64
Lists
Constructing lists
· lists are used to store collections of objects that can be of different lengths
and types

· Constructed using the command list()

list1 <- list(Age=32, Name = "Aline", Grades = c(98,85,96))


list1

$Age
[1] 32

$Name
[1] "Aline"

$Grades
[1] 98 85 96

32/64
Objects of a list
· An object within a list can be referred to by appending a string character
($) followed by the name of the object to the list names or
list_name$object_name

list1$Grades # select the object named 'Grades' within the list

[1] 98 85 96

33/64
· An object or object elements within a list can also be referred to by
appending an index vector (enclosed in double square brackets, [[]])

list1[[3]] # select the third object in the list

[1] 98 85 96

# select the object named 'Grades' within the list


list1[['Grades']]

[1] 98 85 96

34/64
Data frames
What are Data frames?
· Similar to the matrices (rows and columns, two dimensional), but different
columns can stored different types of vectors. However, the vectors must
have the same size or length

· Data sets are accommodate in R by data frames, in which each row


corresponds to an observation and each column corresponds to a variable

36/64
· Example (Mello e Peternelli, 2013):

Name <- c("José", "Angela", "Aline", "Mayara", "Lara", "Nicolas")


Age <- c(17,17,16,15,15,13)
Gender <- factor(c("M","F","F","F", "F", "M"))
Grades <- c(92,75,81,87,90,88)
Name

[1] "José" "Angela" "Aline" "Mayara" "Lara" "Nicolas"

Age

[1] 17 17 16 15 15 13

Gender

[1] M F F F F M
Levels: F M

37/64
Grades

[1] 92 75 81 87 90 88

data1 <- data.frame(Name, Age, Gender, Grades)


data1

Name Age Gender Grades


1 José 17 M 92
2 Angela 17 F 75
3 Aline 16 F 81
4 Mayara 15 F 87
5 Lara 15 F 90
6 Nicolas 13 M 88

38/64
Importing (reading) data
· There are a large number of competing methods that can be used to
import data and from a wide variety of sources

· We will present the simplest methods of importing data from the most
popular sources

· The easiest form of importation is from a pure text file

· The most common text file are both comma ou semicolon delimited and
tab delimited

· In both cases we use the read.table() command

· Please, look the Excel file example1_phenological_data.xlsx, that is


available to download from Campus Virtual

39/64
· To read this data file, it is necessary to save
example1_phenological_data.xlsx from Excel to a text file

· After that, you can use the read.table() command

· Don’t forget to change the working directory, if necessary. To do this, use


the setwd() command

· To read a semicolon delimited text file, you can use the commands:

setwd("C:/Users/Izabela/Desktop/Aulas 2022 01/PG Data vis and manip/Slides/Slides")


data2 <- read.table("example1_phenological_data.csv", #
head=T, sep=";", dec=",")

40/64
data2 <- read.table("example1_phenological_data.csv", #
head=T, sep=";", dec=",")
str(data2)

'data.frame': 36 obs. of 4 variables:


$ month : int 1 1 1 1 1 1 1 1 1 1 ...
$ plant : int 1 1 1 1 2 2 2 2 3 3 ...
$ branch : int 1 2 3 4 1 2 3 4 1 2 ...
$ branch.length: num 0.52 0.79 0.6 0.6 0.685 0.585 0.485 1.13 0.5 0.69 ...

head(data2)

month plant branch branch.length


1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585

41/64
· To read a tab delimited text file, you can use the commands:

data3 <- read.table("example1_phenological_data.txt", #


head=T, sep="\t", dec=",")
head(data3)

month plant branch branch.length


1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585

42/64
# You can omit the 'sep="\t"' argument
# and just use the command
data4 <- read.table("example1_phenological_data.txt", #
head=T, dec=",")
head(data4)

month plant branch branch.length


1 1 1 1 0.520
2 1 1 2 0.790
3 1 1 3 0.600
4 1 1 4 0.600
5 1 2 1 0.685
6 1 2 2 0.585

43/64
Reviewing a data frame
· fix() function: it is used to view a data frame as a simple spreadsheet in a
separate window

fix(data1) # type this command on RStudio


fix(data2) # type this command on RStudio

44/64
Indexing data frames
· A vector or vector elements within a data frame can be referred to by
appending an index vector (enclosed in square brackets, [ ]) or by using
data_frame_name$column_name

# select the element at observation 2, variable 1


data1[2,1]

[1] "Angela"

# select the entire 2nd observation (all variables)


data1[2,]

Name Age Gender Grades


2 Angela 17 F 75

45/64
data1$Name # select the entire variable 'Name'

[1] "José" "Angela" "Aline" "Mayara" "Lara" "Nicolas"

# select the 2nd element of the variable 'Name'


data1$Name[2]

[1] "Angela"

# select the elements 1 through 3 of the variable 'Name'


data1$Name[1:3]

[1] "José" "Angela" "Aline"

46/64
· Indexing by conditions

# select the observations that have Grades greater


# or equal to 90
data1[data1$Grades >= 90,]

Name Age Gender Grades


1 José 17 M 92
5 Lara 15 F 90

# select the observations for "F" that have Grades greater


# or equal to 90
data1[data1$Grades >= 90 & data1$Gender=="F",]

Name Age Gender Grades


5 Lara 15 F 90

47/64
Sorting datasets
· order()function: to sort datasets according to one or more variables

data1

Name Age Gender Grades


1 José 17 M 92
2 Angela 17 F 75
3 Aline 16 F 81
4 Mayara 15 F 87
5 Lara 15 F 90
6 Nicolas 13 M 88

48/64
data1[order(data1$Gender, data1$Name),]

Name Age Gender Grades


3 Aline 16 F 81
2 Angela 17 F 75
5 Lara 15 F 90
4 Mayara 15 F 87
1 José 17 M 92
6 Nicolas 13 M 88

49/64
Manipulation of data frames
· Commands cbind() and rbind() can be used for data frames

data1 <- cbind(data1, Grade=c("A","C", "B", "B","A", "B"))


data1

Name Age Gender Grades Grade


1 José 17 M 92 A
2 Angela 17 F 75 C
3 Aline 16 F 81 B
4 Mayara 15 F 87 B
5 Lara 15 F 90 A
6 Nicolas 13 M 88 B

· It can be also used data1$Grade <- c("A","C", "B", "B","A", "B")

50/64
data1$Grade2 <- c("A","C", "B", "B","A", "B")
data1

Name Age Gender Grades Grade Grade2


1 José 17 M 92 A A
2 Angela 17 F 75 C C
3 Aline 16 F 81 B B
4 Mayara 15 F 87 B B
5 Lara 15 F 90 A A
6 Nicolas 13 M 88 B B

51/64
· To split a data.frame by groups: command split()

split(data1, Gender)

$F
Name Age Gender Grades Grade Grade2
2 Angela 17 F 75 C C
3 Aline 16 F 81 B B
4 Mayara 15 F 87 B B
5 Lara 15 F 90 A A

$M
Name Age Gender Grades Grade Grade2
1 José 17 M 92 A A
6 Nicolas 13 M 88 B B

52/64
Object information and
conversion
Object’s attributes
· All R objects are of a certain type or class

name <- "Renato"


class(name)

[1] "character"

class(Age)

[1] "numeric"

class(Gender)

[1] "factor"

54/64
· Family of functions prefixed with is.: to evaluate whether or not an object
is of a particular class

Examples: is.numeric( ), is.null( ), is.character( ), is.vector( ),


is.factor( ), is.matrix( ), is.data.frame( )

is.data.frame(Age)

[1] FALSE

is.data.frame(data1)

[1] TRUE

is.numeric(Age)

[1] TRUE

55/64
· Size or length of an object:

y <- seq(1,20, by=0.5)


length(y)

[1] 39

56/64
· Other characteristics of an object can be view by using str()

data2 <- c(100, 45, 55)


names(data2) <- c("Student", "Male", "Female")
data2

Student Male Female


100 45 55

str(data2)

Named num [1:3] 100 45 55


- attr(*, "names")= chr [1:3] "Student" "Male" "Female"

57/64
str(data1)

'data.frame': 6 obs. of 6 variables:


$ Name : chr "José" "Angela" "Aline" "Mayara" ...
$ Age : num 17 17 16 15 15 13
$ Gender: Factor w/ 2 levels "F","M": 2 1 1 1 1 2
$ Grades: num 92 75 81 87 90 88
$ Grade : chr "A" "C" "B" "B" ...
$ Grade2: chr "A" "C" "B" "B" ...

58/64
· Command attributes(): to access object’s attributes

attributes(data2)

$names
[1] "Student" "Male" "Female"

attributes(XY) # XY is a 5x2 matrix with columns names 'X' and 'Y'

$dim
[1] 5 2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "X" "Y"

59/64
attributes(data1)

$names
[1] "Name" "Age" "Gender" "Grades" "Grade" "Grade2"

$row.names
[1] 1 2 3 4 5 6

$class
[1] "data.frame"

60/64
Object conversion
· Objects can be converted into other objects using a family of functions
with a as. prefix

Examples: as.numeric(), as.character(), as.vector(), as.factor(),


as.matrix(), as.data.frame()

61/64
· To convert a matrix into a vector

x1 <- matrix(1:12, ncol=4)


x1

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

as.vector(x1)

[1] 1 2 3 4 5 6 7 8 9 10 11 12

62/64
str(data1)

'data.frame': 6 obs. of 6 variables:


$ Name : chr "José" "Angela" "Aline" "Mayara" ...
$ Age : num 17 17 16 15 15 13
$ Gender: Factor w/ 2 levels "F","M": 2 1 1 1 1 2
$ Grades: num 92 75 81 87 90 88
$ Grade : chr "A" "C" "B" "B" ...
$ Grade2: chr "A" "C" "B" "B" ...

data1$Grade<- as.factor(data1$Grade)
str(data1)

'data.frame': 6 obs. of 6 variables:


$ Name : chr "José" "Angela" "Aline" "Mayara" ...
$ Age : num 17 17 16 15 15 13
$ Gender: Factor w/ 2 levels "F","M": 2 1 1 1 1 2
$ Grades: num 92 75 81 87 90 88
$ Grade : Factor w/ 3 levels "A","B","C": 1 3 2 2 1 2
$ Grade2: chr "A" "C" "B" "B" ...

63/64
References
LOGAN, M. (2010) Biostatistical Design and Analysis Using R: A Practical Guide.
Hoboken, NJ: Wiley-Blackwell.

MELLO, M. P.; PETERNELLI L. A. (2013) Conhecendo R: uma visão mais que


estatística. Viçosa, MG: Editora UFV.

64/64

You might also like