Professional Documents
Culture Documents
WK4
Subjects to be covered
Vector
List
Function
List
Vector
# object A contains a
numeric value of 10 Vector
R Objects
Objects Vector
List
Value
Vector
List
associated with them. Function
Data frame
A <- 10
Vector
attributes(A) # returns NULL
Vector
R Data Types
R data types
6 basic ”atomic” classes
• Logical
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types
R Data types
6 basic ”atomic” classes Logical
R Data types
6 basic ”atomic” classes
Integers
R Data types
6 basic ”atomic” classes
Complex - complex numbers
• Logical E.g. 1 + 0i, 1 + 4i, 1+8i etc.
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types
R Data types
6 basic ”atomic” classes Character - Individual characters and strings
• Logical E.g. "a", "hats", "fish "
Lot’s more.
R Vectors
R Hierarchy of Vectors
Atomic vectors
Atomic vectors
Complexity
Logical
Logical
*Note
Numeric
Numeric
List NULL Integer Atomic vectors
Integer
Double
Double must contain data
of the same data
Character type (mode) –
Character
homogenous
R Vectors
R Hierarchy of Vectors
B <- c("a", "b", "c","d", "e", "f","g","h") Vector “B” is class character it contains 8
elements of type character. It has no
class(B) # character attributes (NULL vector).
length(B) # 8
typeof(B) # character
attributes(B) # NULL
R Vector Properties
Vector properties – Integer vector
A <- c( 1 , 0 , 2 , 6 , 8 , NA , 11 , 15, 7, 12 )
is.na(A) # returns logical vector
mean(A, na.rm = TRUE) # remove NA before calculating mean.
A <- na.omit(A) # remove NA’s from A
print(A)
R Atomic Vector
Logical vector
• Logical vectors can have the values TRUE, FALSE and NA
B <- c( 2 , 0 , 3, 1 )
A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 3, 1 )
Recycled B = 2, 0, 3, 1, 2, 0
E <- A+ B
print(E) #E = 3 0 5 7 10 0 • R copies the vector to itself to match the
length of the longest vector.
Coercion
• If a function encounters an argument / value of the wrong type, R Vectors values must
will try to change (coerce) the argument to a different type. be the same data
E.g. “3” is a character other values are numeric. type – numeric,
A <- c(1,2,"3",4,5,6,12,13) character, strings,
logical etc.
• In the above example the numeric values will be coerced to
characters.
E.g.
print(A)
# A = "1", "2","3", "4", "5", "6", "12", "13"
R Vectors - Coercion
Coercion of Logicals
E.g. TRUE is a logical other values are numeric.
Numerical equivalents
A <- c(1, 2 , TRUE , 4 , 5 ,10 , 12 ,13)
TRUE = 1
• In the above example the logical value will be coerced to 1 FALSE = 0
(numeric).
E.g.
A = 1, 2 , 1 , 4 , 5 ,10 , 12 ,13
R Vectors Coercion
Coercion
double • When character strings are
present, everything will be coerced
integer complex to a character string.
Coercion
• Different data types can be coerced to another explicitly (if possible).
A <- c(1.2, 2.3 , 1.454 , 4.6 , 5.2 ,10.1 , 12.88 ,13.4) Explicit coercion functions
# length = 10
# index 1 2 3 4 5
A <- c( 1 , 0 , 2 , 6 , 8 )
A Vector name
1 2 3 4 5 Index number
1 0 2 6 8 elements
R Indexing
Vector position indexing - navigating
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Indexing in R starts with
1 and increments by 1
# length = 10
for the length of the
# index 1 2 3 4 5 6 7 8 9 10
vector.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A negative index will
A[4] # returns value 6.
exclude the positive
A[8] # returns value 15.
index value.
A[-8] # returns all values excluding 8.
A[ ] # returns all values.
• A blank index will return
all values.
R Indexing
Vector position indexing
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Index numbers greater
than the vector length
# length = 10
return NA.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A sequence or a vector
can be used to get values
A[11] # returns NA, length(A) = 10
from a vector.
A[1:3] # returns 1, 0 and 2
A[c(1,2,3)] # returns 1, 0 and 2
• A blank index will return
A[]
all values.
A[] <- 0 # replaces all values with 0
R Indexing
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
Replacement
Assigning new value to vector.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
Combining vectors
• Vectors can be appended to one another using the c()
function.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 )
B <- c(2,3,4)
D <- c(A, B)
length(D)
print(D)
R Creating vectors 2
: - used to create integer sequence
seq() sequence function
Creating vectors using rep() repeat function
sequences and repeat
A <- 2:6 # an integer sequence
• Vectors can be created from an A
integer sequence e.g 2:6 A <- seq(2, 3, by=0.5) # numeric sequence
A
• Or a complex sequence using A <- rep(2, times = 4) # repeat 2
seq() function. A
9: a. Create a numeric vector named values containing the following numbers: 5, 15, 25, 35,
b. Remove the third element from the values vector and print the modified vector.
c. Remove the first and fifth elements from the values vector and print the modified vector.
Vector Exercises
Vector indexing
10: a. Create a numeric vector named data containing the following numbers: 13, 24, 36, 48, 61
The modulo operator %% returns the remainder of a division. For example A[A %% 2 == 0] returns TRUE for
even numbers in vector A.
b. Create a logical vector named even that contains TRUE for even numbers and FALSE for odd
numbers in the data vector.
c. Using the even logical vector, print only the even numbers from the data vector.
d. Create a new vector named filtered_data that contains only the numbers greater than 30 from the
data vector and print it.
R Lists
Lists
List created using
• R objects which can contain data of different types.
• E.g. numbers, strings, vectors and another list inside it.
• A list can also contain a matrix, array or a function as its elements. list()
# Example 1.
List A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 0 , 6)
Vector A
AB <- list(A,B)
Vector B
class(AB)
str(AB)
Bigger <- list(AB, c(1,0,3,4,5))
str(Bigger)
R Lists -named elements
Lists – naming list elements on creation or using names()
# Example 2. list()
names() – for adding names
st1 <- c(21,32,11) and retrieving them
st2 <- c("red", "yellow")
st3 <- c(2L,4L, 5L, 6L) Accessed using “$”
names(stuff)
str(stuff)
R Lists –Indexing
Lists – Indexing
• Like atomic vectors list elements can be accessed by indexing.
• Use double and single square brackets. List Indexing
1 1 1 1 2 3 A[[3]][[1]][3]
[[1]][1] first element of list element 1
2 2 2 1 2 3
[[2]][1] first element of list element 2
3 3 3 1 2 3
[[1]][[1]][1] first value of list element 1
A[[3]][[3]][3]
A[[1]][3]
A[[3]][3]
list element 1
R Lists –Indexing
• Merging lists
data1 <- list(c(11, 8, 19, 2, 46), c(3,4,5,6) )
data2 <- list(c("a", "b", "b", "c", "d"), c(2,2,2,2))
M - Filled by row
R Matrix filling
Columns
Matrix – filling by row Rows
x1 x2 x3 x4 x5 x6
1 2 3 4 5 6
# What if data vector is not divisible by col number 7 8 9 10 11 12
data <- c(1:15) 13 14 15 1 2 3
M - Filled by column
R Empty matrix
Columns
Empty matrix
Rows
• Empty matrix is useful for making empty x1 x2 x3
dataframes!
Rows = 0
# A matrix with column but zero rows
M <- matrix(ncol=3,nrow=0)
dim(M)
x1 x2 x3
• Can make matrix of a single value etc
0 0 0
M <- matrix(0, nrow = 5, ncol =3 ) 0 0 0
print(M)
0 0 0
# or 0 0 0
a <- c(0) 0 0 0
M <- matrix(a, nrow = 5, ncol =3 ) # same result
R Matrix naming
Get and change matrix dimension names
Matrix – Naming
• Dimension names can be supplied as a list of colnames()
vectors. rownames()
# change names
colnames(P) <- c("C1","C2","C3") # new column names
R Matrix properties
Matrix - properties
A <- matrix(c(1:15), nrow = 3, ncol =5, byrow =TRUE )
dim() – dimensions
length() - total number of elements
# properties
object.size() – how many bytes in memory
summary( ) – column wise summary statistics
dim(A) # dimensions in rows , columns
object.size(A) # size
class(airquality) # built in New York air –quality data names() – column names
dim() - rows , columns length
names(airquality) # column names
ncol() – number of columns
dim(airquality)
ncol(airquality) # number of columns
nrow() – number of rows
nrow(airquality) # number of rows length() - columns * rows
length(airquality) str() -dataframe structure
str(airquality) summary() – column summary stats
summary(airquality)
R dataframes – coercion
Coercion
as.data.frame()
Vector one-column data frame
# Matrix example
• Like atomic vectors, we can use positive integers, df[ row, col, drop=TRUE ]
- Use square brackets after matrix name.
negative integers, logical vectors, nothing, zero, and
names to access dataframe elements. - Drop = TRUE will return a vector for single
dimension elements. Drop = FALSE will
• Similar to matrices [row, column] indexing return a matrix.
df[, 1] all rows column 1.
head( , n) – show first n rows
Best practice is to use column names. tail( , n) – show last n rows
df[, c('x', 'y’)] all rows columns x and y
A <- Air[Air$Ozone > 10, ] ; print(A) # Note that the logical is where the row index normally is.
B <- which(Air$Ozone > 10); print(B) # row indexs where Ozone is greater than 10
C <- Air[B, ]; print(B) # Equivalent to A
D <- Air[B, 'Temp’ ]; print(D) # vector of Temp with Ozone > 10
E <- Air[B, 'Temp', drop = FALSE ] ; print(E) # dataframe of Temp with Ozone > 10
R Combining dataframes
Combining data frames
Appending columns - cbind()
• Adding a column to dataframe using a a vector or a dataframe with cbind()
• Column names of dataframes need to be unique.
• To cbind dataframe and column to add must have the same number of rows.
Air <- airquality
New_column <- c(1:nrow(Air)) # Vector has to be the same length as a dataframe column
newAir <- cbind(Air,New_column)
str(newAir)
Next_row <- as.data.frame(list(c(1L, 1L), c(1L, 1L), c(1, 1), c(1L, 1L), c(1L, 1L), c(1L, 1L)))
names(Next_row) <- c(names(newAir)) # column names have to match
df <- rbind(newAir,Next_row)
tail(df) #
R Combining dataframes
Combining data frames
Merging two dataframes
• The two dataframes must share one common column name and have the same number of rows.
length(Air$Ozone)
Air2 <- Air[-c(1), ] # Remove first row
length(Air2$Ozone)
Importing and exporting dataframe text files
read.table() write.table()
# see help(read.table)
Importing and exporting text files
read.table() write.table()
write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.
dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row
write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.
dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row
readxl("path")
install.packages("tidyverse")
install.packages("readxl")
library(readxl)
See(https://readxl.tidyverse.org)