You are on page 1of 93

Introduction to R for Air Quality Research

WK4

Dr. Ross Edwards - Senior Scientist


Wisconsin State Laboratory of Hygiene, University of Wisconsin Madison
Acknowledgements and disclaimer
Disclaimer
This presentation was funded by a grant from the United States Department of
State to the University of Wisconsin. The opinions, findings and conclusions
stated herein are those of the presenter and do not necessarily reflect those of
the United States Department of State.
Introduction to R

Subjects to be covered

1. Cran R Installation and setup navigating.


2. R- Studio 1.
3. R- Studio 2.
4. Data types and structures 1.
5. Data types and structures 2.
6. Data types and structures 3.
7. Importing and exporting data.
8. Datetime objects
9. Plotting.
10. Plotting.
12. Writing functions.
13. Writing functions.
WK3 R Data types and Structures 1.

R Data types and structures


R Objects
Objects Vector
List
• Everything in R is an
object! List

Vector

List
Function

• Objects are created by Function


assignment “ <-”
Dataframe

E.g Array Vector


A <- 10

List
Vector
# object A contains a
numeric value of 10 Vector
R Objects
Objects Vector
List

• R objects can have a metadata

Value
Vector

List
associated with them. Function

• The metadata can be displayed using


attributes() function. Function

• * Note atomic vectors have an empty


attributes on creation.
Vector

Data frame
A <- 10
Vector
attributes(A) # returns NULL
Vector
R Data Types

R data types
6 basic ”atomic” classes

• Logical
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types

R Data types
6 basic ”atomic” classes Logical

• Logical E.g. TRUE, FALSE


Numeric equivalent is 1, 0
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types

R Data types
6 basic ”atomic” classes
Integers

• Logical defined using “L”


E.G. 2L, 3L, 4L
• Integer
• Numeric Convert real numbers to integer using
as.integer(2)
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types

R Data types Numeric - Real numbers


Double floating-point precision “double”.
6 basic ”atomic” classes
By default numbers are doubles e.g. 2.0, 3.000, 4.0,
pi
• Logical
• Integer * Note that because of finite precision.
• Numeric sqrt(2) ^ 2 == 2 # Square root of two equivalent to 2
• Complex squared returns false. Because sqrt(2) = 1.414214 …….
• Character
as.integer(sqrt(2) ^ 2) == 2 # TRUE
• Raw – This type is beyond our scope
R Data Types

R Data types
6 basic ”atomic” classes
Complex - complex numbers
• Logical E.g. 1 + 0i, 1 + 4i, 1+8i etc.
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types

R Data types
6 basic ”atomic” classes Character - Individual characters and strings
• Logical E.g. "a", "hats", "fish "

• Integer Note they enclosed by " " not ” .


• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Properties

R Data type commands


Similar commands which can give more and more detail about a
data object .

typeof() # what is this object?


class() # what is it? what is it?
storage.mode() # what is it? what is it? what is it?

length() # how many elements does it contain?


attributes() # does it have any metadata?
attr() # specific attribute?
str() # what type and how is it structured.
R Data Structures
R Data structures
R contains a large number of different data structures built from the different data
types.
•Atomic vector
•List - a type of recursive vector
•Matrix
•Data frame
•Factors (we will avoid these, but they have their uses)
•Tables

Lot’s more.
R Vectors

R Hierarchy of Vectors
Atomic vectors
Atomic vectors
Complexity
Logical
Logical
*Note
Numeric
Numeric
List NULL Integer Atomic vectors
Integer
Double
Double must contain data
of the same data
Character type (mode) –
Character
homogenous
R Vectors

R Hierarchy of Vectors

Atomic vectors List


• recursive vectors because lists can contain other lists.
Logical
• Difference between atomic vectors and lists is that lists
can contain anything (heterogenous).
Numeric • Atomic vectors are of the same data type (homogenous).
Integer List NULL
Double NULL

• NULL is an empty vector. A vector of length zero.


Character
A <- NULL # create an empty vector
R Atomic Vectors

Atomic vectors Create an atomic


vector in R - use
• Most basic data type in R. the c() function.
• Building block of other data structures.
• Comprised of a sequence of data of the same type • Combines values into a
(homogeneous) vector or a list.

A <- c(1,2,3,4,5,6,12,13) • Values are arguments of


# Vector A contains numeric values the c() function.

• Values are separated by


B <- c("1", "2","3", "4", "5", "6”,"12", "13”) commas.
# Vector B contains characters
R Vector Properties
Vector properties
A <- c( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8, 9, 10 ) Vector “A” class is numeric, it contains
10 elements of type double . It has no
class(A) # numeric attributes (NULL vector)
length(A) # 10
typeof(A) # double
attributes(A) # NULL

B <- c("a", "b", "c","d", "e", "f","g","h") Vector “B” is class character it contains 8
elements of type character. It has no
class(B) # character attributes (NULL vector).
length(B) # 8
typeof(B) # character
attributes(B) # NULL
R Vector Properties
Vector properties – Integer vector

A <- c( 1L , 2L , 3L , 4L , 5L , 6L , 7L , 8L, 9L, 10L )

class(A) # integer Integer vector uses “L” to


length(A) # 10 distinguish between integers
typeof(A) # integer and doubles.
attributes(A) # NULL
R Atomic Vectors

Missing values / missingness


Basic NA functions
• In R missing data are coded as NA “Not
Available”. is.na() – search for NA’s
• Many R functions have arguments to deal with na.omit() – remove NA’s
missing numbers.

A <- c( 1 , 0 , 2 , 6 , 8 , NA , 11 , 15, 7, 12 )
is.na(A) # returns logical vector
mean(A, na.rm = TRUE) # remove NA before calculating mean.
A <- na.omit(A) # remove NA’s from A
print(A)
R Atomic Vector

Logical vector
• Logical vectors can have the values TRUE, FALSE and NA

L <- c( TRUE , FALSE, NA ) • Class is logical, it contains 3


elements of type logical.
class(l ) # logical
length(l) #3 • Logicals have a numeric
typeof(l ) # logical equivalent
TRUE = 1, FALSE = 0, NA = NA.
R Vector - Arithmetic

Math operations ……… Vector A


• Arithmetic operations of vectors are performed *2
member-by-member.

A <- c( 1 , 0 , 2 , 6 , 8 , 0) ……… Vector A


B <- c( 2 , 0 , 3, 1 , 5 , 6)

A <- A*2 # multiply all values by 2


print(A) # Result 2 0 4 12 16 0 ……… Vector A
+
E <- A+B # add vectors A ……… Vector B
and B
……… Vector E
print(E) # Result = 4 0 7 13 21 6
R Vector - Arithmetic

Math operations - Recycling Rule


• If two vectors are of unequal length, shorter vector will be recycled
in order to match the longer vector. Recycling

B <- c( 2 , 0 , 3, 1 )
A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 3, 1 )
Recycled B = 2, 0, 3, 1, 2, 0
E <- A+ B
print(E) #E = 3 0 5 7 10 0 • R copies the vector to itself to match the
length of the longest vector.

***** Note that recycling can be a big source of problems if it is unintended.


R Atomic Vectors

Coercion
• If a function encounters an argument / value of the wrong type, R Vectors values must
will try to change (coerce) the argument to a different type. be the same data
E.g. “3” is a character other values are numeric. type – numeric,
A <- c(1,2,"3",4,5,6,12,13) character, strings,
logical etc.
• In the above example the numeric values will be coerced to
characters.
E.g.
print(A)
# A = "1", "2","3", "4", "5", "6", "12", "13"
R Vectors - Coercion

Coercion of Logicals
E.g. TRUE is a logical other values are numeric.
Numerical equivalents
A <- c(1, 2 , TRUE , 4 , 5 ,10 , 12 ,13)
TRUE = 1
• In the above example the logical value will be coerced to 1 FALSE = 0
(numeric).
E.g.
A = 1, 2 , 1 , 4 , 5 ,10 , 12 ,13
R Vectors Coercion

Coercion
double • When character strings are
present, everything will be coerced
integer complex to a character string.

• Coercion favors the most


logical character complicated data type.

If R can’t decide how to coerce an object it will insert an NA.


R Studio – R Atomic Vectors

Coercion
• Different data types can be coerced to another explicitly (if possible).

A <- c(1.2, 2.3 , 1.454 , 4.6 , 5.2 ,10.1 , 12.88 ,13.4) Explicit coercion functions

as.integer(A) # returns 1 2 1 4 5 10 12 13 as.integer()


as.numeric()
B <- c("1", "2 ", "3" , "4", "5", "6", "7","8") as.character()
……………
as.numeric(B) # returns 1 2 3 4 5 6 7 8
• Many functions have custom
coercion functions built in.
R Indexing
Vector position indexing - navigating
• Vectors values are indexed from left to right

# length = 10
# index 1 2 3 4 5
A <- c( 1 , 0 , 2 , 6 , 8 )

A Vector name

1 2 3 4 5 Index number
1 0 2 6 8 elements
R Indexing
Vector position indexing - navigating
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Indexing in R starts with
1 and increments by 1
# length = 10
for the length of the
# index 1 2 3 4 5 6 7 8 9 10
vector.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A negative index will
A[4] # returns value 6.
exclude the positive
A[8] # returns value 15.
index value.
A[-8] # returns all values excluding 8.
A[ ] # returns all values.
• A blank index will return
all values.
R Indexing
Vector position indexing
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Index numbers greater
than the vector length
# length = 10
return NA.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A sequence or a vector
can be used to get values
A[11] # returns NA, length(A) = 10
from a vector.
A[1:3] # returns 1, 0 and 2
A[c(1,2,3)] # returns 1, 0 and 2
• A blank index will return
A[]
all values.
A[] <- 0 # replaces all values with 0
R Indexing

Vector logical indexing [] Index operators


• Returns a vector value(s) based on whether an expression is ! not
true or false. == equivalent to
!= not equivalent to
> greater than
# index 1 2 3 4 5 6 7 8 9 10 >= greater or equal to
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 ) < less than
<= less than or equal to
A[A == 6] # returns value 6 & AND
A[A != 6] # returns all values which are not 6 | OR

# Note that A[-6] returns all the value except for


# the element in position 6!
Do not use =
R Sub-setting

Sub-setting returns a group of values.


• Returns a group of vector value(s) based on whether an
expression is true or false.

# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )

A[A > 11] # returns 12 and 15

A[A >= 11] # returns 11, 12 and 15

a <- A[A >= 11] # object “a” contains 11, 12 and 15


R Sub-setting

Sub-setting with Boolean operators


• Indexing expression can include Boolean operators.

# index 1 2 3 4 5 6 7 8 9 10 & AND


A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 ) | OR
A[A > 11 & A < 15] # returns 12
A[A > 8 | A < 2 ] # returns 11, 12 , 15, 0,0, 1

B <- A[A > 8 | A < 2 ] ; B # new vector subset of A


R Sub-setting - replacement

Replacement
Assigning new value to vector.

# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )

A[3] <- 0 # A element 3 IS NOW 0


A[A < 8] <- 0 # Values less than 0 are now 0
A[A > 8] <- 0 # Values greater than 0 are now zero
R Combining vectors

Combining vectors
• Vectors can be appended to one another using the c()
function.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 )
B <- c(2,3,4)
D <- c(A, B)
length(D)
print(D)
R Creating vectors 2
: - used to create integer sequence
seq() sequence function
Creating vectors using rep() repeat function
sequences and repeat
A <- 2:6 # an integer sequence
• Vectors can be created from an A
integer sequence e.g 2:6 A <- seq(2, 3, by=0.5) # numeric sequence
A
• Or a complex sequence using A <- rep(2, times = 4) # repeat 2
seq() function. A

A <- rep(1:3, times = 4) # repeat sequence


• Or using repeat rep() function A

A <- rep(1:3, each = 4) # repeat each element of sequence


A
R Vectors
Named vectors V <- c("a" = 1, "b" = 2, "c"=3) # names created with vector

• The values (elements) contained in a vector print(V) # abc


# 123
can also be named either at creation or after
using names function and a character vector. names(V) <- c("X", "y","z") # change or add names afterwards.

• The vector now has the attribute “names”. print(V) # Xyz


# 123

• Names can be used for sub-setting. attributes(V) # $names


[1] "X" "y" "z"
• The vector indexing still exists and can be
V["z"] # output = 3
used in addition to the element names.

• The character vector used to add the element


names has to have the same number of
elements as the vector being named.
R Vector Properties
Augmented vectors
• Vectors containing arbitrary metadata in the form of attributes.

• Attributes are used to create:

Factors - built on integer vectors.

Dates and date-times - built on numeric vectors

Data frames and Tibbles – built on lists.


End Wk4
Vector Exercises R functions:
Vector creation and basic operations
1. a. Create two numeric vectors named vector1 and vector2: print(), typeof(), c(), min(),
max(), mean(), median(), and
vector1 should contain the following numbers: 2, 4, 6, 8, 10
sort().
vector2 should contain the following numbers:1, 3, 5, 7, 9
b. Calculate and print the sum of the two vectors, element-wise. Math operators: +, -, *, /,
c. Calculate and print the difference between the two vectors,
element-wise.
d. Calculate and print the product of the two vectors, element-wise.
e. Calculate and print the quotient of the two vectors, element-wise.
f. Find and print the minimum value in vector1.
g. Find and print the maximum value in vector2.
h. Compute and print the mean of each vector.
i. Compute and print the median of each vector.
j. Create a new vector named vector3 that contains the elements of both vector1
and vector2.
k. Sort vector3 in ascending order and print the result.
Vector Exercises R functions:

Vector Coercion print(), typeof(), c(), min(),


max(), mean(), median(), and
2: a. Create a vector named mixed_vector1 containing the following sort().
elements: 1, "two", 3.0, "four", TRUE
b. Determine named mixed_vector1’s type using typeof() and print the Math operators: +, -, *, /,

the contents of mixed_vector1 to the console.

3: Create a vector named mixed_vector2 containing the following


elements: 1.0, 2L, 3.5, 4L, TRUE . What data type is it?
4: a. Combine mixed_ vector1 and mixed_vector2 into new vector
“mixed_vector3”. What data type does it contain?
b. How many elements does “mixed_vector3” contain (use length()).
Vector Exercises R functions:

print(), typeof(), length(), c(),


Vector creation using seq() and rep() min(), max(), mean(), median(),
and sort().
5. Create a numeric vector from a sequence. Using seq() Math operators: +, -, *, /,
function .
6. Create a numeric vector from an integer sequence using the
rep() function.
Vector Exercises
Vector Recycling
7. Create two numeric vectors named vectorA and vectorB:
1. vectorA should contain the following numbers: 1, 2, 3
2. vectorB should contain the following numbers: 4, 5
a. Perform element-wise addition of vectorA and vectorB, save the result in a new vector
named result_addition, and print the result.
b. Perform element-wise subtraction of vectorA and vectorB, save the result in a new
vector named result_subtraction, and print the result.
c. Perform element-wise multiplication of vectorA and vectorB, save the result in a new
vector named result_multiplication, and print the result.
d. Perform element-wise division of vectorA and vectorB, save the result in a new vector
named result_division, and print the result.
Vector Exercises
Vector indexing
8: a. Create a numeric vector named numbers containing the following numbers: 10, 20, 30,
40, 50
b. Print the second element of the numbers vector.
c. Print the first and third elements of the numbers vector.
d. Print all elements of the numbers vector except the fourth one.

9: a. Create a numeric vector named values containing the following numbers: 5, 15, 25, 35,
b. Remove the third element from the values vector and print the modified vector.
c. Remove the first and fifth elements from the values vector and print the modified vector.
Vector Exercises
Vector indexing

10: a. Create a numeric vector named data containing the following numbers: 13, 24, 36, 48, 61
The modulo operator %% returns the remainder of a division. For example A[A %% 2 == 0] returns TRUE for
even numbers in vector A.
b. Create a logical vector named even that contains TRUE for even numbers and FALSE for odd
numbers in the data vector.
c. Using the even logical vector, print only the even numbers from the data vector.
d. Create a new vector named filtered_data that contains only the numbers greater than 30 from the
data vector and print it.
R Lists

Lists
List created using
• R objects which can contain data of different types.
• E.g. numbers, strings, vectors and another list inside it.
• A list can also contain a matrix, array or a function as its elements. list()

# Example 1.
List A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 0 , 6)
Vector A
AB <- list(A,B)
Vector B
class(AB)
str(AB)
Bigger <- list(AB, c(1,0,3,4,5))

str(Bigger)
R Lists -named elements
Lists – naming list elements on creation or using names()

• Like atomic vectors list elements can be named using a names


Named lists are created
vector. using:

# Example 2. list()
names() – for adding names
st1 <- c(21,32,11) and retrieving them
st2 <- c("red", "yellow")
st3 <- c(2L,4L, 5L, 6L) Accessed using “$”

stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list


print(stuff)

stuff <- list(st1,st2,st3)


names(stuff) <- c("doubles", "colors", "integers") # adding names to a list
print(stuff)
R Lists -named elements
Lists – getting list names
• Sometimes you won’t know the list element names.
• You can find them using names() or str().
Named lists are created
# Example 3. Finding list names using:

st1 <- c(21,32,11) Get names using


st2 <- c("red", "yellow") names() or str()
st3 <- c(2L,4L, 5L, 6L)

stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list

names(stuff)
str(stuff)
R Lists –Indexing
Lists – Indexing
• Like atomic vectors list elements can be accessed by indexing.
• Use double and single square brackets. List Indexing

# Example 4 indexing lists [ [..]]

st1 <- c(21,32,11)


st2 <- c("red", "yellow") [[1 ]] first element of list
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) [[1]][1] first value of first list element 1
stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3)

stuff[[1]] # doubles [[3]][1] first value of list element 3


stuff[[2]] # colors
stuff[[3]] # integers
stuff[[3]][1] # Integer 2
R Lists –Indexing

Lists – Indexing List Indexing


• What if a list contains a list? Use [[… ]] [..]
st1 <- c(21,32,11)
st2 <- c("red", "yellow")
st3 <- c(2L,4L, 5L, 6L) [[1 ]] first element of list
stuff <- list(st1, st2, st3) # make list
[[1]][1] first element of list element 1
junk <- list(stuff, c("car", "plane")) # list containing a list
[[2]][1] first element of list element 2
iunk[[1]] # stuff list
junk[[2]] # character vector [[1]][[1]][1] first value of list element 1
list element 1
junk[[1]][1] # st1 vector from stuff list
junk[[1]][[1]][1] # 21 from st1 from stuff list
junk[[2]][[1]] # Car from character vector
R Lists –Indexing

Lists – Indexing List Indexing


• What if a list contains a list? Use [[… ]] [..]
List A A[[1]] List
Atomic vector
1 2 3 [[1 ]] first element of list

1 1 1 1 2 3 A[[3]][[1]][3]
[[1]][1] first element of list element 1
2 2 2 1 2 3
[[2]][1] first element of list element 2
3 3 3 1 2 3
[[1]][[1]][1] first value of list element 1
A[[3]][[3]][3]
A[[1]][3]
A[[3]][3]
list element 1
R Lists –Indexing

lists – lists using names


• It’s much easier to use names to access components. Accessing lists containing
named lists.
st1 <- c(21,32,11)
st2 <- c("red", "yellow") …..$name$name[ ]
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) # make list
names(stuff) <- c("doubles", "colors", "integers")

junk <- list(stuff, c("car", "plane")) # list containing a list


names(junk) <- c("stuff", "transport")

junk$stuff # stuff list


junk$stuff$colors
junk$stuff$colors[1] # Red
str(junk)
R Lists – replacing list element

st1 <- c(21,32,11)


st2 <- c("red", "yellow") Lists – replacement
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) # make list • Use assignment <- for replacement.
names(stuff) <- c("doubles", "colors", "integers") • Use NULL to remove a list element.

junk <- list(stuff, c("car", "plane")) # list containing a list


names(junk) <- c("stuff", "transport")

junk$stuff$colors[1] <- "blue ” # Replace red with blue


junk$stuff$colors <- NULL # Remove colors from list
junk$stuff$newcolors <- c("red", "yellow") # add vector to list
R Lists – unlisting

Convert list to atomic vector

data1 <- list(c(11, 8, 19, 2, 46)) Use unlist()


class(data1)
• Flattens list to an atomic
v <- unlist(data1)
class(v)
vector.

Data2 <- list(c(1, 2, 3, 2, 46), data1) • Elements are added to


X <- unlist(Data2) vector in sequence.
print(X)
• Note - automatic
Data3 <- list(data1, c("a", "b")) coercion will occur.
Y <- unlist(Data3)
print(Y) # character vector
R Lists – concatenation

Lists – Indexing Use c() or append()

• Merging lists
data1 <- list(c(11, 8, 19, 2, 46), c(3,4,5,6) )
data2 <- list(c("a", "b", "b", "c", "d"), c(2,2,2,2))

New <- c(data1, data2)


str(New)

Old <- append(data1, data2)


str(Old) # same result
R Matrix filling
Columns
Matrix – filling by row Rows
x1 x2 x3
1 2 3
# Elements are arranged sequentially by row. 4 5 6
data <- c(1:15)
7 8 9
M <- matrix(data, ncol = 3, byrow = TRUE) 10 11 12
print(M) 13 14 15

M - Filled by row
R Matrix filling
Columns
Matrix – filling by row Rows
x1 x2 x3 x4 x5 x6
1 2 3 4 5 6
# What if data vector is not divisible by col number 7 8 9 10 11 12
data <- c(1:15) 13 14 15 1 2 3

M <- matrix(data, ncol = 6, byrow = TRUE)


print(M) # Recycling of vector. Recycling of data vector

*** Note if the vector filling the matrix can not


completely fill a row or column R will recycle
the vector to complete the row or column.
R Matrix filling
Columns
Matrix – filling by column
Rows
x1 x2 x3
# Elements are arranged sequentially by column. 1 6 11
data <- c(1:15)
2 7 12
M <- matrix(data, ncol = 3, byrow = FALSE) 3 8 13
print(M) 4 9 14
5 10 15
# In this case either nrow or ncol can be used to
create M
data <- c(1:15) M - Filled by column

M <- matrix(data, nrow = 5, byrow = FALSE)


print(M)
R Matrix filling
Columns
Matrix – filling by column
Rows
x1 x2 x3
# Data is not divisible by columns 1 7 13
data <- c(1:16)
2 8 14
M <- matrix(data, ncol = 3, byrow = FALSE) 3 9 15
print(M) # recycling of vector 4 10 16 Recycling of
5 11 1 data vector
6 12 2

M - Filled by column
R Empty matrix
Columns
Empty matrix
Rows
• Empty matrix is useful for making empty x1 x2 x3
dataframes!
Rows = 0
# A matrix with column but zero rows
M <- matrix(ncol=3,nrow=0)
dim(M)
x1 x2 x3
• Can make matrix of a single value etc
0 0 0
M <- matrix(0, nrow = 5, ncol =3 ) 0 0 0
print(M)
0 0 0
# or 0 0 0
a <- c(0) 0 0 0
M <- matrix(a, nrow = 5, ncol =3 ) # same result
R Matrix naming
Get and change matrix dimension names
Matrix – Naming
• Dimension names can be supplied as a list of colnames()
vectors. rownames()

# Define the column and row names.


rownames = c("y1", "y2", "y3", "y4", "y5")
colnames = c("x1", "x2", "x3")

P <- matrix(c(1:15), nrow = 5, byrow = TRUE, dimnames =list(rownames, colnames))


print(P)

# change names
colnames(P) <- c("C1","C2","C3") # new column names
R Matrix properties
Matrix - properties
A <- matrix(c(1:15), nrow = 3, ncol =5, byrow =TRUE )
dim() – dimensions
length() - total number of elements
# properties
object.size() – how many bytes in memory
summary( ) – column wise summary statistics
dim(A) # dimensions in rows , columns

length(A) # rows x columns

object.size(A) # size

summary(A) # get summary stats of columns


R Accessing matrix elements
Subsetting Matrix
M[ row, col, drop=TRUE ]
- Use square brackets after matrix name.
• Can subset matrices (2d) similar to methods - Drop = TRUE will return a vector for single
used to index atomic vectors. dimension elements. Drop = FALSE will
return a matrix.
• Use the ‘coordinate’ of each element, using
square brackets” [ “ as before, separated by head( , n) – show first n rows
commas. tail( , n) – show last n rows
• E.g. for a 2D matrix: [row, col].

• For example: m[1, 1] row 1 column 1 element.

• m[ , c(1,2)] - All rows columns 1 & 2.


• m[1, ] -Row 1 all columns.
R Accessing matrix elements
Subsetting Matrix
M[ row, col, drop=TRUE ]
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE ) - Use square brackets after matrix name.
- Drop = TRUE will return a vector for single
# Access rows 1 to 3. dimension elements. Drop = FALSE will
print(P[1:3,]) # returns a matrix return a matrix.
# Access columns 1 to 2.
print(P[,1:2]) # returns a matrix head( , n) – show first n rows
P[,1:2, drop = TRUE] tail( , n) – show last n rows

# Access values from rows 1 to 3 and column 1.


a <- P[1:3,1]) # returns a vector # Show first two rows
print(a) head(P, 2) #. Show first row
a <- P[1:3,1, drop = FALSE]) # returns a matrix
print(a) # Show last two rows
tail(P, 2) # show last row
R Manipulating matrices
Matrix - adding and removing elements
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE )
O <- matrix(c(1:3), nrow = 1, ncol =3) rbind(a, b) – add row “b” onto “a”
N <- matrix(c(1:5), nrow = 5, ncol =1)
cbind(a, b) – add column “b” onto “a”
# Add a rows
S <- rbind(P, O)
print(S) M[-c(row),] - remove row.

# Add a columns M[,-c(col)] - remove column


T <- cbind(P, N)
print(T)

# Remove rows and columns.


P[-c(1),] # Remove row 1
P[,-c(1)] # Remove column 1
R Manipulating matrices
Matrix – replacing and re-dimensioning
dim() can be used to change matrix
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE ) dimensions
# Replace single value
P[1,1] <- 20 dim(x) <- c(row , column)
print(P)
** Note you can only redimension to
# Replace values using logical an object with the same matrix
P[P<10] <- 11 length().
print(P) R will throw an error there’s no
recycling in dimensioning.
dim(P) # 5 rows, 3 columns
dim(P) <- c(3,5) # New dimensions columns become rows.
dim(P)
print(P)
R Create matrix

Alternate way to create a matrix


• cbind() and rbind( ) can also be used to create a matrix.

# Matrix using rbind


u <- rbind(c(1:5), c(1:5))
print(u)

# Matrix using cbind


u <- cbind(c(1:5), c(1:5))
print(u)
R Matrix math
Element by Element - Addition and subtraction
# Create two 2x5 matrices. 3 -1 2 5 0 3 8 -1 5
+ =
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6),nrow = 2) 9 4 6 2 9 4 11 13 10
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)

# Add the matrices.


result <- matrix1 + matrix2 3 -1 2 5 0 3 -2 -1 -1
cat("Result of addition","\n") - =
9 4 6 2 9 4 7 -5 2
print(result) # Result is a matrix

# Subtract the matrices


result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result) # Result is a matrix
R Matrix math
Element by Element Multiplication and division
3 -1 2 5 0 3 15 0 10
# Create two 2x5 matrices. * =
9 4 6 2 9 4 18 36 24
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6),nrow = 2)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)

# Multiply the matrices 3 -1 2 5 0 3 0.6 -Inf 0.666


result <- matrix1 * matrix2 / = 6667
cat("Result of multiplication","\n") 9 4 6 2 9 4
print(result) # Result is a matrix 4.5 0.4444 1.500
444 0000
# Divide the matrices
result <- matrix1 / matrix2 -Inf = negative infinity.
cat("Result of division","\n")
print(result) # Result is a matrix Note decimal places are controlled by
options(digits=7).
# Multiply all elements by scalar
matrix1 * 5
R Matrix math
Matrix Multiplication and division
A = matrix(c(1:30),nrow=10, ncol=3) %*% - Matrix multiplies two matrices, if they are conformable
B = matrix(c(1:15),nrow=3, ncol=5) t() - transpose
C= matrix(c(1:16),nrow=4, ncol=4) # square matrix diag( ) – extract diagonal elements of square matrix
# matrix multiplication product of AB
A %*% B

# transpose of B # diagonal elements of square matrix


t(B) diag(C )

# matrix product B'A' # Trace of square matrix


t(B) %*% t(A) sum(diag(C ))
R Matrix functions
apply()
apply() : apply a function across a row or down a
apply(X, MARGIN, FUN, ..., simplify = TRUE) column
MARGIN: = 1 for rows,= 2 for columns
FUN = function to apply
A = matrix(c(1:30),nrow=10, ncol=3) Simplify = TRUE / FALSE,
apply(A, MARGIN = 1, sum) #row sums TRUE : results in an atomic vector
apply(A, MARGIN = 1, mean) #row means FALSE: returns a list.
apply(A, MARGIN = 2, mean) # column means
R Dataframe
R dataframes
Dataframes Columns
Column names
• Most common R data structure x y z of variables
Rows
• A 2D table-like structure. 1 “a” 22
• Columns are atomic vectors. 2 “a” 33
• Columns must be named.
• Row names must be unique 3 “a” 44
Columns must
4 “a” 55
contain the same
7 “a” 7 number of data
-Essentially a list of vectors. items.
NA “a” 88
88 NA 99
- shares many of the properties of matrices
and of lists. 9 “a” NA
5 “a” 7
R Create a dataframe
Dataframe function
If want to name rows
Vectors a char vector goes here
separated by ”,”
Defaults

data.frame( ..., row.names = NULL, check.rows = FALSE,


check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = FALSE)

Check column names Should character vectors be converted to


are valid. Create column names if not supplied. factors? Factors are characters associated
with integer values. Except for advanced
must be the same logical cases set to FALSE.
R dataframes
Creating Dataframes – from atomic vectors

st1 <- c(21,32,11) Dataframes:


st2 <- c("red", "yellow" , "orange")
st3 <- c(2L,4L, 5L) • If column names are not supplied
stuff <- data.frame(st1, st2, st3) # make class(stuff) R will create them.
names(stuff)
• Access columns using “$..name”
# supply names on creation
stuff <- data.frame("double" =st1, "char" = st2, "ints" =st3) same as with lists and atomic
print(stuff) vectors.

# change names • Get names: names() or str() or


names(stuff) <- c("X", "Y", "Z") # new column names colnames().
str(attributes(stuff))
R dataframe properties
Dataframe Properties Properties:

class(airquality) # built in New York air –quality data names() – column names
dim() - rows , columns length
names(airquality) # column names
ncol() – number of columns
dim(airquality)
ncol(airquality) # number of columns
nrow() – number of rows
nrow(airquality) # number of rows length() - columns * rows
length(airquality) str() -dataframe structure
str(airquality) summary() – column summary stats
summary(airquality)
R dataframes – coercion
Coercion

as.data.frame()
Vector one-column data frame

List one column for each element provided elements are


same length.

Matrix a data frame with same number of columns and rows as


matrix.
R dataframes
Creating Dataframes coercion from lists and Matrices

• List vectors must be the st1 <- c(21,32,11)


same length. st2 <- c("red", "yellow" , "orange")
st3 <- c(2L,4L, 5L)
• Unlike matrices
stuff <- list(st1, st2, st3)
dataframes will not
recycle vectors.
df <- as.data.frame(stuff) # or just data.frame(stuff)
• Coercion of lists produce print(df)
crazy column names! names(df) <- c("st1", " st2 ", "st3")
print(df)

# Matrix example

mat <- matrix(c(1:40), nrow = 10, byrow = TRUE)


df <- as.data.frame(mat)
str(df)
R dataframes – create dataframe with zero rows
Empty dataframes
• Empty data frames are useful in programing
• Create by coercion of a zero row matrix.
• Create by initializing empty vectors.

# Zero row matrix method

M <- matrix(ncol=3,nrow=0) # 3 columns zero rows


df1 <- data.frame(M)
colnames(Df) <- c("A", "B", "C")

# Empty vector method


df2 <- data.frame(A=double(), B=integer(), C=character()) # simpler method
R dataframes – create dataframe with zero rows
Accessing dataframe elements – subsetting

• Like atomic vectors, we can use positive integers, df[ row, col, drop=TRUE ]
- Use square brackets after matrix name.
negative integers, logical vectors, nothing, zero, and
names to access dataframe elements. - Drop = TRUE will return a vector for single
dimension elements. Drop = FALSE will
• Similar to matrices [row, column] indexing return a matrix.
df[, 1] all rows column 1.
head( , n) – show first n rows
Best practice is to use column names. tail( , n) – show last n rows
df[, c('x', 'y’)] all rows columns x and y

• df[, 'x’] - This will return a vector similar to matrices.


• df[, ‘x’, drop =FALSE] – This will return a dataframe.
• df['x’] -list style column x
R dataframes – create dataframe with zero rows
Accessing dataframe subsetting
x<- c(1:5); y<- c(1:5); z<- c(1:5)
df <- data.frame(x,y,z)
a <- df[,2] ; class(a) # y column, returns a vector
b <- df[,2, drop = FALSE] ; class(b) # y column, returns a dataframe
c<- df[1,] ; class(c) # row 1 all columns, returns a dataframe because multiple columns.
d <- df[c(1:4), ] # rows 1 to 4.
e <- df[, 'x']; class(e) # Single column returns a vector can add, drop =FALSE to return a
dataframe.
v<- df[,c('x', 'y')]; class(v) # Multiple columns returns a dataframe like to matrices .
f <- df['x'] ; class(f) # List style name returns a dataframe.
L<- df[['x’]] ; class(L) # Double square brackets returns a vector.
h<- df$x ; class(h) # column name using $ returns a vector $ = shorthand for [[..]]
R dataframes – create dataframe with zero rows
Subset based on column logical
Accessing dataframe subsetting df[ df$name logical , ] – returns dataframe
Air <- airquality. # air quality dataset
str(Air) which(logical) – returns a vector of row indexs

Air[1, ] # Row one.

• Subsetting columns with a logical

#E.g. get all rows with ozone greater than 10

A <- Air[Air$Ozone > 10, ] ; print(A) # Note that the logical is where the row index normally is.

B <- which(Air$Ozone > 10); print(B) # row indexs where Ozone is greater than 10
C <- Air[B, ]; print(B) # Equivalent to A
D <- Air[B, 'Temp’ ]; print(D) # vector of Temp with Ozone > 10
E <- Air[B, 'Temp', drop = FALSE ] ; print(E) # dataframe of Temp with Ozone > 10
R Combining dataframes
Combining data frames
Appending columns - cbind()
• Adding a column to dataframe using a a vector or a dataframe with cbind()
• Column names of dataframes need to be unique.
• To cbind dataframe and column to add must have the same number of rows.
Air <- airquality
New_column <- c(1:nrow(Air)) # Vector has to be the same length as a dataframe column
newAir <- cbind(Air,New_column)
str(newAir)

Next_columns <- data.frame("A" =c(1:nrow(Air)), "B" = c(1:nrow(Air)))


newAir <- cbind(newAir,Next_columns)
df <- cbind(newAir,Next_columns)
str(df) #
R Combining dataframes
Combining data frames
Appending rows - rbind()
• Adding rows to dataframe using a a vector or a dataframe with rbind()
• Vector data will be coerced to dataframe column data types so need to be careful.
• Dataframes to be appended must have the same column names.

Air <- airquality


New_row <- c(1:ncol(Air)) # Vector must be the same length as dataframe number of columns
newAir <- rbind(Air,New_row)
tail(newAir) # Note data is coerced to different data types.

Next_row <- as.data.frame(list(c(1L, 1L), c(1L, 1L), c(1, 1), c(1L, 1L), c(1L, 1L), c(1L, 1L)))
names(Next_row) <- c(names(newAir)) # column names have to match
df <- rbind(newAir,Next_row)
tail(df) #
R Combining dataframes
Combining data frames
Merging two dataframes
• The two dataframes must share one common column name and have the same number of rows.

Air <- airquality

Air1 <- airquality[,c(6,2,3)] # column subset


head(Air1)
Air2 <- airquality[,c(6,4,5)] # column subset
head(Air2)
airz <- merge(Air1, Air2)
head(airz)
R Combining dataframes

Removing columns and rows


Removing columns and rows dataframes
• Remove columns or rows by subsetting

Air <- airquality

Air1 <- Air[,-c(6)] # Remove column 6


head(Air1)

length(Air$Ozone)
Air2 <- Air[-c(1), ] # Remove first row
length(Air2$Ozone)
Importing and exporting dataframe text files
read.table() write.table()

Air <- airquality


class(Air)
head(Air, 6)

write.table(Air, "./air.csv", sep=",") # write dataframe to root directory as CSV.

dataset <-read.table("./air.csv", sep=",", header=TRUE) # read csv file

write.table(Air, "./air.txt", sep="\t") # write dataframe to root directory as tab delimited.

dataset <-read.table("./air.txt", sep="\t", header=TRUE) # read tab delimited file

# see help(read.table)
Importing and exporting text files
read.table() write.table()

write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.

read.table(filepath , separator or delimiter , header=TRUE, skip = 1) # skip 1st row

Header = TRUE. - Read the first row in as column names.


Header = FALSE. – Create column names automatically, but need to skip header row if it exists.

dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row

# add your own column names


dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1, col.names =c("A", "B", "C", "D", "E", "F", "G") )#
read tab delimited file skip header row and supply column names
head(dataset,6)
Importing and exporting text files
read.table() write.table()

write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.

read.table(filepath , separator or delimiter , header=TRUE, skip = 1) # skip 1st row

Header = TRUE. - Read the first row in as column names.


Header = FALSE. – Create column names automatically, but need to skip header row if it exists.

dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row

# add your own column names


dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1, col.names =c("A", "B", "C", "D", "E", "F", "G") )#
read tab delimited file skip header row and supply column names
head(dataset,6)
Importing and exporting text files
Read missing data strings like -9999 and convert to NA

Air$Ozone[is.na(Air$Ozone)]<- -9999 # change NA to -9999


write.table(Air, "./air.txt", sep="\t") # write dataframe to root directory as CSV.
head(Air) # Ozone NA is now -9999
dataset <-read.table("./air.txt", sep="\t", header=TRUE, na.strings =c("-9999")) # read tab delimited file -9999 as
NA
head(dataset) # -9999’s are now NA
Importing and exporting text files
Read data in and force to a specific data type

dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = "double", "Solar.R" = "double",


"Wind" = "double", "Temp" = "double", "Month" = "integer", "Day" = "integer"))

# Use NULL to not read a column from file !

dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = " NULL"))


head(dataset,6)
Importing and exporting text files
Read data in and force to a specific data type

dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = "double", "Solar.R" = "double",


"Wind" = "double", "Temp" = "double", "Month" = "integer", "Day" = "integer"))

# Use NULL to not read a column from file !

dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = " NULL"))


head(dataset,6)
Importing and exporting excel files
Use tidyverse package readxl

readxl("path")

install.packages("tidyverse")
install.packages("readxl")
library(readxl)

See(https://readxl.tidyverse.org)

You might also like