Intro2R Wk4 Rev

Introduction to R for Air Quality Research
WK4
Dr. Ross Edwards - Senior Scientist

Wisconsin State Laboratory of Hygiene, University of Wisconsin Madison
Acknowledgements and disclaimer
Disclaimer
This presentation was funded by a grant from the United States Department of
State to the University of Wisconsin. The opinions, findings and conclusions
stated herein are those of the presenter and do not necessarily reflect those of
the United States Department of State.
Introduction to R
Subjects to be covered
1. Cran R Installation and setup navigating.

2. R- Studio 1.
3. R- Studio 2.
4. Data types and structures 1.
7. Importing and exporting data.
8. Datetime objects
9. Plotting.
10. Plotting.
12. Writing functions.
13. Writing functions.
WK3 R Data types and Structures 1.
R Data types and structures

R Objects
Objects Vector
List
• Everything in R is an
object! List
Vector
List
Function
• Objects are created by Function

assignment “ <-”
Dataframe
E.g Array Vector

A <- 10
List
Vector
# object A contains a
numeric value of 10 Vector
R Objects
Objects Vector
List
• R objects can have a metadata
Value
Vector
List
associated with them. Function
• The metadata can be displayed using

attributes() function. Function
• * Note atomic vectors have an empty

attributes on creation.
Vector
Data frame
A <- 10
Vector
attributes(A) # returns NULL
Vector
R Data Types
R data types
6 basic ”atomic” classes
• Logical
• Integer
• Numeric
• Complex
• Character
• Raw – This type is beyond our scope
R Data Types
R Data types
6 basic ”atomic” classes Logical
• Logical E.g. TRUE, FALSE

Numeric equivalent is 1, 0
• Integer
• Numeric
• Complex
• Character
R Data Types
R Data types
Integers
• Logical defined using “L”

E.G. 2L, 3L, 4L
• Integer
• Numeric Convert real numbers to integer using
as.integer(2)
• Complex
• Character
R Data Types
R Data types Numeric - Real numbers

Double floating-point precision “double”.
By default numbers are doubles e.g. 2.0, 3.000, 4.0,
pi
• Logical
• Integer * Note that because of finite precision.
• Numeric sqrt(2) ^ 2 == 2 # Square root of two equivalent to 2
• Complex squared returns false. Because sqrt(2) = 1.414214 …….
• Character
as.integer(sqrt(2) ^ 2) == 2 # TRUE
R Data Types
R Data types
Complex - complex numbers
• Logical E.g. 1 + 0i, 1 + 4i, 1+8i etc.
• Integer
• Numeric
• Complex
• Character
R Data Types
R Data types
6 basic ”atomic” classes Character - Individual characters and strings
• Logical E.g. "a", "hats", "fish "
• Integer Note they enclosed by " " not ” .

• Numeric
• Complex
• Character
R Data Properties
R Data type commands

Similar commands which can give more and more detail about a
data object .
typeof() # what is this object?

class() # what is it? what is it?
storage.mode() # what is it? what is it? what is it?
length() # how many elements does it contain?

attributes() # does it have any metadata?
attr() # specific attribute?
str() # what type and how is it structured.
R Data Structures
R Data structures
R contains a large number of different data structures built from the different data
types.
•Atomic vector
•List - a type of recursive vector
•Matrix
•Data frame
•Factors (we will avoid these, but they have their uses)
•Tables
Lot’s more.
R Vectors
R Hierarchy of Vectors
Atomic vectors
Atomic vectors
Complexity
Logical
Logical
*Note
Numeric
Numeric
List NULL Integer Atomic vectors
Integer
Double
Double must contain data
of the same data
Character type (mode) –
Character
homogenous
R Vectors
R Hierarchy of Vectors
Atomic vectors List

• recursive vectors because lists can contain other lists.
Logical
• Difference between atomic vectors and lists is that lists
can contain anything (heterogenous).
Numeric • Atomic vectors are of the same data type (homogenous).
Integer List NULL
Double NULL
• NULL is an empty vector. A vector of length zero.

Character
A <- NULL # create an empty vector
R Atomic Vectors
Atomic vectors Create an atomic

vector in R - use
• Most basic data type in R. the c() function.
• Building block of other data structures.
• Comprised of a sequence of data of the same type • Combines values into a
(homogeneous) vector or a list.
A <- c(1,2,3,4,5,6,12,13) • Values are arguments of

# Vector A contains numeric values the c() function.
• Values are separated by

B <- c("1", "2","3", "4", "5", "6”,"12", "13”) commas.
# Vector B contains characters
R Vector Properties
Vector properties
A <- c( 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8, 9, 10 ) Vector “A” class is numeric, it contains
10 elements of type double . It has no
class(A) # numeric attributes (NULL vector)
length(A) # 10
typeof(A) # double
attributes(A) # NULL
B <- c("a", "b", "c","d", "e", "f","g","h") Vector “B” is class character it contains 8
elements of type character. It has no
class(B) # character attributes (NULL vector).
length(B) # 8
typeof(B) # character
attributes(B) # NULL
R Vector Properties
Vector properties – Integer vector
A <- c( 1L , 2L , 3L , 4L , 5L , 6L , 7L , 8L, 9L, 10L )
class(A) # integer Integer vector uses “L” to

length(A) # 10 distinguish between integers
typeof(A) # integer and doubles.
attributes(A) # NULL
R Atomic Vectors
Missing values / missingness

Basic NA functions
• In R missing data are coded as NA “Not
Available”. is.na() – search for NA’s
• Many R functions have arguments to deal with na.omit() – remove NA’s
missing numbers.
A <- c( 1 , 0 , 2 , 6 , 8 , NA , 11 , 15, 7, 12 )
is.na(A) # returns logical vector
mean(A, na.rm = TRUE) # remove NA before calculating mean.
A <- na.omit(A) # remove NA’s from A
print(A)
R Atomic Vector
Logical vector
• Logical vectors can have the values TRUE, FALSE and NA
L <- c( TRUE , FALSE, NA ) • Class is logical, it contains 3

elements of type logical.
class(l ) # logical
length(l) #3 • Logicals have a numeric
typeof(l ) # logical equivalent
TRUE = 1, FALSE = 0, NA = NA.
R Vector - Arithmetic
Math operations ……… Vector A

• Arithmetic operations of vectors are performed *2
member-by-member.
A <- c( 1 , 0 , 2 , 6 , 8 , 0) ……… Vector A

B <- c( 2 , 0 , 3, 1 , 5 , 6)
A <- A*2 # multiply all values by 2

print(A) # Result 2 0 4 12 16 0 ……… Vector A
+
E <- A+B # add vectors A ……… Vector B
and B
……… Vector E
print(E) # Result = 4 0 7 13 21 6
R Vector - Arithmetic
Math operations - Recycling Rule

• If two vectors are of unequal length, shorter vector will be recycled
in order to match the longer vector. Recycling
B <- c( 2 , 0 , 3, 1 )
A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 3, 1 )
Recycled B = 2, 0, 3, 1, 2, 0
E <- A+ B
print(E) #E = 3 0 5 7 10 0 • R copies the vector to itself to match the
length of the longest vector.
***** Note that recycling can be a big source of problems if it is unintended.

R Atomic Vectors
Coercion
• If a function encounters an argument / value of the wrong type, R Vectors values must
will try to change (coerce) the argument to a different type. be the same data
E.g. “3” is a character other values are numeric. type – numeric,
A <- c(1,2,"3",4,5,6,12,13) character, strings,
logical etc.
• In the above example the numeric values will be coerced to
characters.
E.g.
print(A)
# A = "1", "2","3", "4", "5", "6", "12", "13"
R Vectors - Coercion
Coercion of Logicals
E.g. TRUE is a logical other values are numeric.
Numerical equivalents
A <- c(1, 2 , TRUE , 4 , 5 ,10 , 12 ,13)
TRUE = 1
• In the above example the logical value will be coerced to 1 FALSE = 0
(numeric).
E.g.
A = 1, 2 , 1 , 4 , 5 ,10 , 12 ,13
R Vectors Coercion
Coercion
double • When character strings are
present, everything will be coerced
integer complex to a character string.
• Coercion favors the most

logical character complicated data type.
If R can’t decide how to coerce an object it will insert an NA.

R Studio – R Atomic Vectors
Coercion
• Different data types can be coerced to another explicitly (if possible).
A <- c(1.2, 2.3 , 1.454 , 4.6 , 5.2 ,10.1 , 12.88 ,13.4) Explicit coercion functions
as.integer(A) # returns 1 2 1 4 5 10 12 13 as.integer()

as.numeric()
B <- c("1", "2 ", "3" , "4", "5", "6", "7","8") as.character()
……………
as.numeric(B) # returns 1 2 3 4 5 6 7 8
• Many functions have custom
coercion functions built in.
R Indexing
Vector position indexing - navigating
• Vectors values are indexed from left to right
# length = 10
# index 1 2 3 4 5
A <- c( 1 , 0 , 2 , 6 , 8 )
A Vector name
1 2 3 4 5 Index number
1 0 2 6 8 elements
R Indexing
Vector position indexing - navigating
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Indexing in R starts with
1 and increments by 1
# length = 10
for the length of the
# index 1 2 3 4 5 6 7 8 9 10
vector.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A negative index will
A[4] # returns value 6.
exclude the positive
A[8] # returns value 15.
index value.
A[-8] # returns all values excluding 8.
A[ ] # returns all values.
• A blank index will return
all values.
R Indexing
Vector position indexing
[..]
• Vectors values are indexed and individual values can be
retrieved using [ ] • Index numbers greater
than the vector length
# length = 10
return NA.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
• A sequence or a vector
can be used to get values
A[11] # returns NA, length(A) = 10
from a vector.
A[1:3] # returns 1, 0 and 2
A[c(1,2,3)] # returns 1, 0 and 2
• A blank index will return
A[]
all values.
A[] <- 0 # replaces all values with 0
R Indexing
Vector logical indexing [] Index operators

• Returns a vector value(s) based on whether an expression is ! not
true or false. == equivalent to
!= not equivalent to
> greater than
# index 1 2 3 4 5 6 7 8 9 10 >= greater or equal to
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 ) < less than
<= less than or equal to
A[A == 6] # returns value 6 & AND
A[A != 6] # returns all values which are not 6 | OR
# Note that A[-6] returns all the value except for

# the element in position 6!
Do not use =
R Sub-setting
Sub-setting returns a group of values.

• Returns a group of vector value(s) based on whether an
expression is true or false.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
A[A > 11] # returns 12 and 15
A[A >= 11] # returns 11, 12 and 15
a <- A[A >= 11] # object “a” contains 11, 12 and 15

R Sub-setting
Sub-setting with Boolean operators

• Indexing expression can include Boolean operators.
# index 1 2 3 4 5 6 7 8 9 10 & AND

A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 ) | OR
A[A > 11 & A < 15] # returns 12
A[A > 8 | A < 2 ] # returns 11, 12 , 15, 0,0, 1
B <- A[A > 8 | A < 2 ] ; B # new vector subset of A

R Sub-setting - replacement
Replacement
Assigning new value to vector.
# index 1 2 3 4 5 6 7 8 9 10
A <- c( 1 , 0 , 2 , 6 , 8 , 0 , 11 , 15, 7, 12 )
A[3] <- 0 # A element 3 IS NOW 0

A[A < 8] <- 0 # Values less than 0 are now 0
A[A > 8] <- 0 # Values greater than 0 are now zero
R Combining vectors
Combining vectors
• Vectors can be appended to one another using the c()
function.
A <- c( 1 , 0 , 2 , 6 , 8 , 0 )
B <- c(2,3,4)
D <- c(A, B)
length(D)
print(D)
R Creating vectors 2
: - used to create integer sequence
seq() sequence function
Creating vectors using rep() repeat function
sequences and repeat
A <- 2:6 # an integer sequence
• Vectors can be created from an A
integer sequence e.g 2:6 A <- seq(2, 3, by=0.5) # numeric sequence
A
• Or a complex sequence using A <- rep(2, times = 4) # repeat 2
seq() function. A
A <- rep(1:3, times = 4) # repeat sequence

• Or using repeat rep() function A
A <- rep(1:3, each = 4) # repeat each element of sequence

A
R Vectors
Named vectors V <- c("a" = 1, "b" = 2, "c"=3) # names created with vector
• The values (elements) contained in a vector print(V) # abc

# 123
can also be named either at creation or after
using names function and a character vector. names(V) <- c("X", "y","z") # change or add names afterwards.
• The vector now has the attribute “names”. print(V) # Xyz

# 123
• Names can be used for sub-setting. attributes(V) # $names

[1] "X" "y" "z"
• The vector indexing still exists and can be
V["z"] # output = 3
used in addition to the element names.
• The character vector used to add the element

names has to have the same number of
elements as the vector being named.
R Vector Properties
Augmented vectors
• Vectors containing arbitrary metadata in the form of attributes.
• Attributes are used to create:
Factors - built on integer vectors.
Dates and date-times - built on numeric vectors
Data frames and Tibbles – built on lists.

End Wk4
Vector Exercises R functions:
Vector creation and basic operations
1. a. Create two numeric vectors named vector1 and vector2: print(), typeof(), c(), min(),
max(), mean(), median(), and
vector1 should contain the following numbers: 2, 4, 6, 8, 10
sort().
vector2 should contain the following numbers:1, 3, 5, 7, 9
b. Calculate and print the sum of the two vectors, element-wise. Math operators: +, -, *, /,
c. Calculate and print the difference between the two vectors,
element-wise.
d. Calculate and print the product of the two vectors, element-wise.
e. Calculate and print the quotient of the two vectors, element-wise.
f. Find and print the minimum value in vector1.
g. Find and print the maximum value in vector2.
h. Compute and print the mean of each vector.
i. Compute and print the median of each vector.
j. Create a new vector named vector3 that contains the elements of both vector1
and vector2.
k. Sort vector3 in ascending order and print the result.
Vector Coercion print(), typeof(), c(), min(),

max(), mean(), median(), and
2: a. Create a vector named mixed_vector1 containing the following sort().
elements: 1, "two", 3.0, "four", TRUE
b. Determine named mixed_vector1’s type using typeof() and print the Math operators: +, -, *, /,
the contents of mixed_vector1 to the console.
3: Create a vector named mixed_vector2 containing the following

elements: 1.0, 2L, 3.5, 4L, TRUE . What data type is it?
4: a. Combine mixed_ vector1 and mixed_vector2 into new vector
“mixed_vector3”. What data type does it contain?
b. How many elements does “mixed_vector3” contain (use length()).
print(), typeof(), length(), c(),

Vector creation using seq() and rep() min(), max(), mean(), median(),
and sort().
5. Create a numeric vector from a sequence. Using seq() Math operators: +, -, *, /,
function .
6. Create a numeric vector from an integer sequence using the
rep() function.
Vector Exercises
Vector Recycling
7. Create two numeric vectors named vectorA and vectorB:
1. vectorA should contain the following numbers: 1, 2, 3
2. vectorB should contain the following numbers: 4, 5
a. Perform element-wise addition of vectorA and vectorB, save the result in a new vector
named result_addition, and print the result.
b. Perform element-wise subtraction of vectorA and vectorB, save the result in a new
vector named result_subtraction, and print the result.
c. Perform element-wise multiplication of vectorA and vectorB, save the result in a new
vector named result_multiplication, and print the result.
d. Perform element-wise division of vectorA and vectorB, save the result in a new vector
named result_division, and print the result.
Vector Exercises
Vector indexing
8: a. Create a numeric vector named numbers containing the following numbers: 10, 20, 30,
40, 50
b. Print the second element of the numbers vector.
c. Print the first and third elements of the numbers vector.
d. Print all elements of the numbers vector except the fourth one.
9: a. Create a numeric vector named values containing the following numbers: 5, 15, 25, 35,
b. Remove the third element from the values vector and print the modified vector.
c. Remove the first and fifth elements from the values vector and print the modified vector.
Vector Exercises
Vector indexing
10: a. Create a numeric vector named data containing the following numbers: 13, 24, 36, 48, 61
The modulo operator %% returns the remainder of a division. For example A[A %% 2 == 0] returns TRUE for
even numbers in vector A.
b. Create a logical vector named even that contains TRUE for even numbers and FALSE for odd
numbers in the data vector.
c. Using the even logical vector, print only the even numbers from the data vector.
d. Create a new vector named filtered_data that contains only the numbers greater than 30 from the
data vector and print it.
R Lists
Lists
List created using
• R objects which can contain data of different types.
• E.g. numbers, strings, vectors and another list inside it.
• A list can also contain a matrix, array or a function as its elements. list()
# Example 1.
List A <- c( 1 , 0 , 2 , 6 , 8 , 0)
B <- c( 2 , 0 , 0 , 6)
Vector A
AB <- list(A,B)
Vector B
class(AB)
str(AB)
Bigger <- list(AB, c(1,0,3,4,5))
str(Bigger)
R Lists -named elements
Lists – naming list elements on creation or using names()
• Like atomic vectors list elements can be named using a names

Named lists are created
vector. using:
# Example 2. list()
names() – for adding names
st1 <- c(21,32,11) and retrieving them
st2 <- c("red", "yellow")
st3 <- c(2L,4L, 5L, 6L) Accessed using “$”
stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list

print(stuff)
stuff <- list(st1,st2,st3)

names(stuff) <- c("doubles", "colors", "integers") # adding names to a list
print(stuff)
R Lists -named elements
Lists – getting list names
• Sometimes you won’t know the list element names.
• You can find them using names() or str().
Named lists are created
# Example 3. Finding list names using:
st1 <- c(21,32,11) Get names using

st2 <- c("red", "yellow") names() or str()
st3 <- c(2L,4L, 5L, 6L)
stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list
names(stuff)
str(stuff)
R Lists –Indexing
Lists – Indexing
• Like atomic vectors list elements can be accessed by indexing.
• Use double and single square brackets. List Indexing
# Example 4 indexing lists [ [..]]
st1 <- c(21,32,11)

st2 <- c("red", "yellow") [[1 ]] first element of list
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) [[1]][1] first value of first list element 1
stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3)
stuff[[1]] # doubles [[3]][1] first value of list element 3

stuff[[2]] # colors
stuff[[3]] # integers
stuff[[3]][1] # Integer 2
R Lists –Indexing
Lists – Indexing List Indexing

• What if a list contains a list? Use [[… ]] [..]
st1 <- c(21,32,11)
st2 <- c("red", "yellow")
st3 <- c(2L,4L, 5L, 6L) [[1 ]] first element of list
stuff <- list(st1, st2, st3) # make list
[[1]][1] first element of list element 1
junk <- list(stuff, c("car", "plane")) # list containing a list
iunk[[1]] # stuff list
junk[[2]] # character vector [[1]][[1]][1] first value of list element 1
list element 1
junk[[1]][1] # st1 vector from stuff list
junk[[1]][[1]][1] # 21 from st1 from stuff list
junk[[2]][[1]] # Car from character vector
R Lists –Indexing
Lists – Indexing List Indexing

• What if a list contains a list? Use [[… ]] [..]
List A A[[1]] List
Atomic vector
1 2 3 [[1 ]] first element of list
1 1 1 1 2 3 A[[3]][[1]][3]
2 2 2 1 2 3
3 3 3 1 2 3
[[1]][[1]][1] first value of list element 1
A[[3]][[3]][3]
A[[1]][3]
A[[3]][3]
list element 1
R Lists –Indexing
lists – lists using names

• It’s much easier to use names to access components. Accessing lists containing
named lists.
st1 <- c(21,32,11)
st2 <- c("red", "yellow") …..$name$name[ ]
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) # make list
names(stuff) <- c("doubles", "colors", "integers")

names(junk) <- c("stuff", "transport")
junk$stuff # stuff list

junk$stuff$colors
junk$stuff$colors[1] # Red
str(junk)
R Lists – replacing list element
st1 <- c(21,32,11)

st2 <- c("red", "yellow") Lists – replacement
st3 <- c(2L,4L, 5L, 6L)
stuff <- list(st1, st2, st3) # make list • Use assignment <- for replacement.
names(stuff) <- c("doubles", "colors", "integers") • Use NULL to remove a list element.

names(junk) <- c("stuff", "transport")
junk$stuff$colors[1] <- "blue ” # Replace red with blue

junk$stuff$colors <- NULL # Remove colors from list
junk$stuff$newcolors <- c("red", "yellow") # add vector to list
R Lists – unlisting
Convert list to atomic vector
data1 <- list(c(11, 8, 19, 2, 46)) Use unlist()

class(data1)
• Flattens list to an atomic
v <- unlist(data1)
class(v)
vector.
Data2 <- list(c(1, 2, 3, 2, 46), data1) • Elements are added to

X <- unlist(Data2) vector in sequence.
print(X)
• Note - automatic
Data3 <- list(data1, c("a", "b")) coercion will occur.
Y <- unlist(Data3)
print(Y) # character vector
R Lists – concatenation
Lists – Indexing Use c() or append()
• Merging lists
data1 <- list(c(11, 8, 19, 2, 46), c(3,4,5,6) )
data2 <- list(c("a", "b", "b", "c", "d"), c(2,2,2,2))
New <- c(data1, data2)

str(New)
Old <- append(data1, data2)

str(Old) # same result
R Matrix filling
Columns
Matrix – filling by row Rows
x1 x2 x3
1 2 3
# Elements are arranged sequentially by row. 4 5 6
data <- c(1:15)
7 8 9
M <- matrix(data, ncol = 3, byrow = TRUE) 10 11 12
print(M) 13 14 15
M - Filled by row
R Matrix filling
Columns
Matrix – filling by row Rows
x1 x2 x3 x4 x5 x6
1 2 3 4 5 6
# What if data vector is not divisible by col number 7 8 9 10 11 12
data <- c(1:15) 13 14 15 1 2 3
M <- matrix(data, ncol = 6, byrow = TRUE)

print(M) # Recycling of vector. Recycling of data vector
*** Note if the vector filling the matrix can not

completely fill a row or column R will recycle
the vector to complete the row or column.
R Matrix filling
Columns
Matrix – filling by column
Rows
x1 x2 x3
# Elements are arranged sequentially by column. 1 6 11
data <- c(1:15)
2 7 12
M <- matrix(data, ncol = 3, byrow = FALSE) 3 8 13
print(M) 4 9 14
5 10 15
# In this case either nrow or ncol can be used to
create M
data <- c(1:15) M - Filled by column
M <- matrix(data, nrow = 5, byrow = FALSE)

print(M)
R Matrix filling
Columns
Matrix – filling by column
Rows
x1 x2 x3
# Data is not divisible by columns 1 7 13
data <- c(1:16)
2 8 14
M <- matrix(data, ncol = 3, byrow = FALSE) 3 9 15
print(M) # recycling of vector 4 10 16 Recycling of
5 11 1 data vector
6 12 2
M - Filled by column
R Empty matrix
Columns
Empty matrix
Rows
• Empty matrix is useful for making empty x1 x2 x3
dataframes!
Rows = 0
# A matrix with column but zero rows
M <- matrix(ncol=3,nrow=0)
dim(M)
x1 x2 x3
• Can make matrix of a single value etc
0 0 0
M <- matrix(0, nrow = 5, ncol =3 ) 0 0 0
print(M)
0 0 0
# or 0 0 0
a <- c(0) 0 0 0
M <- matrix(a, nrow = 5, ncol =3 ) # same result
R Matrix naming
Get and change matrix dimension names
Matrix – Naming
• Dimension names can be supplied as a list of colnames()
vectors. rownames()
# Define the column and row names.

rownames = c("y1", "y2", "y3", "y4", "y5")
colnames = c("x1", "x2", "x3")
P <- matrix(c(1:15), nrow = 5, byrow = TRUE, dimnames =list(rownames, colnames))

print(P)
# change names
colnames(P) <- c("C1","C2","C3") # new column names
R Matrix properties
Matrix - properties
A <- matrix(c(1:15), nrow = 3, ncol =5, byrow =TRUE )
dim() – dimensions
length() - total number of elements
# properties
object.size() – how many bytes in memory
summary( ) – column wise summary statistics
dim(A) # dimensions in rows , columns
length(A) # rows x columns
object.size(A) # size
summary(A) # get summary stats of columns

R Accessing matrix elements
Subsetting Matrix
M[ row, col, drop=TRUE ]
- Use square brackets after matrix name.
• Can subset matrices (2d) similar to methods - Drop = TRUE will return a vector for single
used to index atomic vectors. dimension elements. Drop = FALSE will
return a matrix.
• Use the ‘coordinate’ of each element, using
square brackets” [ “ as before, separated by head( , n) – show first n rows
commas. tail( , n) – show last n rows
• E.g. for a 2D matrix: [row, col].
• For example: m[1, 1] row 1 column 1 element.
• m[ , c(1,2)] - All rows columns 1 & 2.

• m[1, ] -Row 1 all columns.
R Accessing matrix elements
Subsetting Matrix
M[ row, col, drop=TRUE ]
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE ) - Use square brackets after matrix name.
- Drop = TRUE will return a vector for single
# Access rows 1 to 3. dimension elements. Drop = FALSE will
print(P[1:3,]) # returns a matrix return a matrix.
# Access columns 1 to 2.
print(P[,1:2]) # returns a matrix head( , n) – show first n rows
P[,1:2, drop = TRUE] tail( , n) – show last n rows
# Access values from rows 1 to 3 and column 1.

a <- P[1:3,1]) # returns a vector # Show first two rows
print(a) head(P, 2) #. Show first row
a <- P[1:3,1, drop = FALSE]) # returns a matrix
print(a) # Show last two rows
tail(P, 2) # show last row
R Manipulating matrices
Matrix - adding and removing elements
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE )
O <- matrix(c(1:3), nrow = 1, ncol =3) rbind(a, b) – add row “b” onto “a”
N <- matrix(c(1:5), nrow = 5, ncol =1)
cbind(a, b) – add column “b” onto “a”
# Add a rows
S <- rbind(P, O)
print(S) M[-c(row),] - remove row.
# Add a columns M[,-c(col)] - remove column

T <- cbind(P, N)
print(T)
# Remove rows and columns.

P[-c(1),] # Remove row 1
P[,-c(1)] # Remove column 1
R Manipulating matrices
Matrix – replacing and re-dimensioning
dim() can be used to change matrix
P <- matrix(c(1:15), nrow = 5, ncol =3, byrow =TRUE ) dimensions
# Replace single value
P[1,1] <- 20 dim(x) <- c(row , column)
print(P)
** Note you can only redimension to
# Replace values using logical an object with the same matrix
P[P<10] <- 11 length().
print(P) R will throw an error there’s no
recycling in dimensioning.
dim(P) # 5 rows, 3 columns
dim(P) <- c(3,5) # New dimensions columns become rows.
dim(P)
print(P)
R Create matrix
Alternate way to create a matrix

• cbind() and rbind( ) can also be used to create a matrix.
# Matrix using rbind

u <- rbind(c(1:5), c(1:5))
print(u)
# Matrix using cbind

u <- cbind(c(1:5), c(1:5))
print(u)
R Matrix math
Element by Element - Addition and subtraction
# Create two 2x5 matrices. 3 -1 2 5 0 3 8 -1 5
+ =
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6),nrow = 2) 9 4 6 2 9 4 11 13 10
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
# Add the matrices.

result <- matrix1 + matrix2 3 -1 2 5 0 3 -2 -1 -1
cat("Result of addition","\n") - =
9 4 6 2 9 4 7 -5 2
print(result) # Result is a matrix
# Subtract the matrices

result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result) # Result is a matrix
R Matrix math
Element by Element Multiplication and division
3 -1 2 5 0 3 15 0 10
# Create two 2x5 matrices. * =
9 4 6 2 9 4 18 36 24
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6),nrow = 2)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
# Multiply the matrices 3 -1 2 5 0 3 0.6 -Inf 0.666

result <- matrix1 * matrix2 / = 6667
cat("Result of multiplication","\n") 9 4 6 2 9 4
print(result) # Result is a matrix 4.5 0.4444 1.500
444 0000
# Divide the matrices
result <- matrix1 / matrix2 -Inf = negative infinity.
cat("Result of division","\n")
print(result) # Result is a matrix Note decimal places are controlled by
options(digits=7).
# Multiply all elements by scalar
matrix1 * 5
R Matrix math
Matrix Multiplication and division
A = matrix(c(1:30),nrow=10, ncol=3) %*% - Matrix multiplies two matrices, if they are conformable
B = matrix(c(1:15),nrow=3, ncol=5) t() - transpose
C= matrix(c(1:16),nrow=4, ncol=4) # square matrix diag( ) – extract diagonal elements of square matrix
# matrix multiplication product of AB
A %*% B
# transpose of B # diagonal elements of square matrix

t(B) diag(C )
# matrix product B'A' # Trace of square matrix

t(B) %*% t(A) sum(diag(C ))
R Matrix functions
apply()
apply() : apply a function across a row or down a
apply(X, MARGIN, FUN, ..., simplify = TRUE) column
MARGIN: = 1 for rows,= 2 for columns
FUN = function to apply
A = matrix(c(1:30),nrow=10, ncol=3) Simplify = TRUE / FALSE,
apply(A, MARGIN = 1, sum) #row sums TRUE : results in an atomic vector
apply(A, MARGIN = 1, mean) #row means FALSE: returns a list.
apply(A, MARGIN = 2, mean) # column means
R Dataframe
R dataframes
Dataframes Columns
Column names
• Most common R data structure x y z of variables
Rows
• A 2D table-like structure. 1 “a” 22
• Columns are atomic vectors. 2 “a” 33
• Columns must be named.
• Row names must be unique 3 “a” 44
Columns must
4 “a” 55
contain the same
7 “a” 7 number of data
-Essentially a list of vectors. items.
NA “a” 88
88 NA 99
- shares many of the properties of matrices
and of lists. 9 “a” NA
5 “a” 7
R Create a dataframe
Dataframe function
If want to name rows
Vectors a char vector goes here
separated by ”,”
Defaults
data.frame( ..., row.names = NULL, check.rows = FALSE,

check.names = TRUE, fix.empty.names = TRUE, stringsAsFactors = FALSE)
Check column names Should character vectors be converted to

are valid. Create column names if not supplied. factors? Factors are characters associated
with integer values. Except for advanced
must be the same logical cases set to FALSE.
R dataframes
Creating Dataframes – from atomic vectors
st1 <- c(21,32,11) Dataframes:

st2 <- c("red", "yellow" , "orange")
st3 <- c(2L,4L, 5L) • If column names are not supplied
stuff <- data.frame(st1, st2, st3) # make class(stuff) R will create them.
names(stuff)
• Access columns using “$..name”
# supply names on creation
stuff <- data.frame("double" =st1, "char" = st2, "ints" =st3) same as with lists and atomic
print(stuff) vectors.
# change names • Get names: names() or str() or

names(stuff) <- c("X", "Y", "Z") # new column names colnames().
str(attributes(stuff))
R dataframe properties
Dataframe Properties Properties:
class(airquality) # built in New York air –quality data names() – column names
dim() - rows , columns length
names(airquality) # column names
ncol() – number of columns
dim(airquality)
ncol(airquality) # number of columns
nrow() – number of rows
nrow(airquality) # number of rows length() - columns * rows
length(airquality) str() -dataframe structure
str(airquality) summary() – column summary stats
summary(airquality)
R dataframes – coercion
Coercion
as.data.frame()
Vector one-column data frame
List one column for each element provided elements are

same length.
Matrix a data frame with same number of columns and rows as

matrix.
R dataframes
Creating Dataframes coercion from lists and Matrices
• List vectors must be the st1 <- c(21,32,11)

same length. st2 <- c("red", "yellow" , "orange")
st3 <- c(2L,4L, 5L)
• Unlike matrices
stuff <- list(st1, st2, st3)
dataframes will not
recycle vectors.
df <- as.data.frame(stuff) # or just data.frame(stuff)
• Coercion of lists produce print(df)
crazy column names! names(df) <- c("st1", " st2 ", "st3")
print(df)
# Matrix example
mat <- matrix(c(1:40), nrow = 10, byrow = TRUE)

df <- as.data.frame(mat)
str(df)
R dataframes – create dataframe with zero rows
Empty dataframes
• Empty data frames are useful in programing
• Create by coercion of a zero row matrix.
• Create by initializing empty vectors.
# Zero row matrix method
M <- matrix(ncol=3,nrow=0) # 3 columns zero rows

df1 <- data.frame(M)
colnames(Df) <- c("A", "B", "C")
# Empty vector method

df2 <- data.frame(A=double(), B=integer(), C=character()) # simpler method
Accessing dataframe elements – subsetting
• Like atomic vectors, we can use positive integers, df[ row, col, drop=TRUE ]
- Use square brackets after matrix name.
negative integers, logical vectors, nothing, zero, and
names to access dataframe elements. - Drop = TRUE will return a vector for single
dimension elements. Drop = FALSE will
• Similar to matrices [row, column] indexing return a matrix.
df[, 1] all rows column 1.
head( , n) – show first n rows
Best practice is to use column names. tail( , n) – show last n rows
df[, c('x', 'y’)] all rows columns x and y
• df[, 'x’] - This will return a vector similar to matrices.

• df[, ‘x’, drop =FALSE] – This will return a dataframe.
• df['x’] -list style column x
Accessing dataframe subsetting
x<- c(1:5); y<- c(1:5); z<- c(1:5)
df <- data.frame(x,y,z)
a <- df[,2] ; class(a) # y column, returns a vector
b <- df[,2, drop = FALSE] ; class(b) # y column, returns a dataframe
c<- df[1,] ; class(c) # row 1 all columns, returns a dataframe because multiple columns.
d <- df[c(1:4), ] # rows 1 to 4.
e <- df[, 'x']; class(e) # Single column returns a vector can add, drop =FALSE to return a
dataframe.
v<- df[,c('x', 'y')]; class(v) # Multiple columns returns a dataframe like to matrices .
f <- df['x'] ; class(f) # List style name returns a dataframe.
L<- df[['x’]] ; class(L) # Double square brackets returns a vector.
h<- df$x ; class(h) # column name using $ returns a vector $ = shorthand for [[..]]
Subset based on column logical
Accessing dataframe subsetting df[ df$name logical , ] – returns dataframe
Air <- airquality. # air quality dataset
str(Air) which(logical) – returns a vector of row indexs
Air[1, ] # Row one.
• Subsetting columns with a logical
#E.g. get all rows with ozone greater than 10
A <- Air[Air$Ozone > 10, ] ; print(A) # Note that the logical is where the row index normally is.
B <- which(Air$Ozone > 10); print(B) # row indexs where Ozone is greater than 10
C <- Air[B, ]; print(B) # Equivalent to A
D <- Air[B, 'Temp’ ]; print(D) # vector of Temp with Ozone > 10
E <- Air[B, 'Temp', drop = FALSE ] ; print(E) # dataframe of Temp with Ozone > 10
R Combining dataframes
Combining data frames
Appending columns - cbind()
• Adding a column to dataframe using a a vector or a dataframe with cbind()
• Column names of dataframes need to be unique.
• To cbind dataframe and column to add must have the same number of rows.
Air <- airquality
New_column <- c(1:nrow(Air)) # Vector has to be the same length as a dataframe column
newAir <- cbind(Air,New_column)
str(newAir)
Next_columns <- data.frame("A" =c(1:nrow(Air)), "B" = c(1:nrow(Air)))

newAir <- cbind(newAir,Next_columns)
df <- cbind(newAir,Next_columns)
str(df) #
Appending rows - rbind()
• Adding rows to dataframe using a a vector or a dataframe with rbind()
• Vector data will be coerced to dataframe column data types so need to be careful.
• Dataframes to be appended must have the same column names.
Air <- airquality

New_row <- c(1:ncol(Air)) # Vector must be the same length as dataframe number of columns
newAir <- rbind(Air,New_row)
tail(newAir) # Note data is coerced to different data types.
Next_row <- as.data.frame(list(c(1L, 1L), c(1L, 1L), c(1, 1), c(1L, 1L), c(1L, 1L), c(1L, 1L)))
names(Next_row) <- c(names(newAir)) # column names have to match
df <- rbind(newAir,Next_row)
tail(df) #
Merging two dataframes
• The two dataframes must share one common column name and have the same number of rows.
Air <- airquality
Air1 <- airquality[,c(6,2,3)] # column subset

head(Air1)
Air2 <- airquality[,c(6,4,5)] # column subset
head(Air2)
airz <- merge(Air1, Air2)
head(airz)
Removing columns and rows

Removing columns and rows dataframes
• Remove columns or rows by subsetting
Air <- airquality
Air1 <- Air[,-c(6)] # Remove column 6

head(Air1)
length(Air$Ozone)
Air2 <- Air[-c(1), ] # Remove first row
length(Air2$Ozone)
Importing and exporting dataframe text files
read.table() write.table()
Air <- airquality

class(Air)
head(Air, 6)
write.table(Air, "./air.csv", sep=",") # write dataframe to root directory as CSV.
dataset <-read.table("./air.csv", sep=",", header=TRUE) # read csv file
write.table(Air, "./air.txt", sep="\t") # write dataframe to root directory as tab delimited.
dataset <-read.table("./air.txt", sep="\t", header=TRUE) # read tab delimited file
# see help(read.table)
Importing and exporting text files
write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.
read.table(filepath , separator or delimiter , header=TRUE, skip = 1) # skip 1st row
Header = TRUE. - Read the first row in as column names.

Header = FALSE. – Create column names automatically, but need to skip header row if it exists.
dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row
# add your own column names

dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1, col.names =c("A", "B", "C", "D", "E", "F", "G") )#
read tab delimited file skip header row and supply column names
head(dataset,6)
write.table(dataframe, filepath, separator or delimiter) # column names automatically written to first row.
read.table(filepath , separator or delimiter , header=TRUE, skip = 1) # skip 1st row
Header = TRUE. - Read the first row in as column names.

Header = FALSE. – Create column names automatically, but need to skip header row if it exists.
dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1) # read tab delimited file skip first row
# add your own column names

dataset <-read.table("./air.txt", sep="\t", header=FALSE, skip = 1, col.names =c("A", "B", "C", "D", "E", "F", "G") )#
read tab delimited file skip header row and supply column names
head(dataset,6)
Read missing data strings like -9999 and convert to NA
Air$Ozone[is.na(Air$Ozone)]<- -9999 # change NA to -9999

write.table(Air, "./air.txt", sep="\t") # write dataframe to root directory as CSV.
head(Air) # Ozone NA is now -9999
dataset <-read.table("./air.txt", sep="\t", header=TRUE, na.strings =c("-9999")) # read tab delimited file -9999 as
NA
head(dataset) # -9999’s are now NA
Read data in and force to a specific data type
dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = "double", "Solar.R" = "double",

"Wind" = "double", "Temp" = "double", "Month" = "integer", "Day" = "integer"))
# Use NULL to not read a column from file !
dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = " NULL"))

head(dataset,6)
Read data in and force to a specific data type
dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = "double", "Solar.R" = "double",

"Wind" = "double", "Temp" = "double", "Month" = "integer", "Day" = "integer"))
# Use NULL to not read a column from file !
dataset <-read.table("./air.csv", sep=",", header=TRUE, colClasses =c("Ozone" = " NULL"))

head(dataset,6)
Importing and exporting excel files
Use tidyverse package readxl
readxl("path")
install.packages("tidyverse")
install.packages("readxl")
library(readxl)
See(https://readxl.tidyverse.org)

Intro2R Wk4 Rev

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro2R Wk4 Rev

Uploaded by

Copyright:

Available Formats

Introduction to R for Air Quality Research

Dr. Ross Edwards - Senior Scientist

1. Cran R Installation and setup navigating.

R Data types and structures

• Objects are created by Function

E.g Array Vector

• R objects can have a metadata

• The metadata can be displayed using

• * Note atomic vectors have an empty

• Logical E.g. TRUE, FALSE

• Logical defined using “L”

R Data types Numeric - Real numbers

• Integer Note they enclosed by " " not ” .

R Data type commands

typeof() # what is this object?

length() # how many elements does it contain?

Atomic vectors List

• NULL is an empty vector. A vector of length zero.

Atomic vectors Create an atomic

A <- c(1,2,3,4,5,6,12,13) • Values are arguments of

• Values are separated by

A <- c( 1L , 2L , 3L , 4L , 5L , 6L , 7L , 8L, 9L, 10L )

class(A) # integer Integer vector uses “L” to

Missing values / missingness

L <- c( TRUE , FALSE, NA ) • Class is logical, it contains 3

Math operations ……… Vector A

A <- c( 1 , 0 , 2 , 6 , 8 , 0) ……… Vector A

A <- A*2 # multiply all values by 2

Math operations - Recycling Rule

***** Note that recycling can be a big source of problems if it is unintended.

• Coercion favors the most

If R can’t decide how to coerce an object it will insert an NA.

as.integer(A) # returns 1 2 1 4 5 10 12 13 as.integer()

Vector logical indexing [] Index operators

# Note that A[-6] returns all the value except for

Sub-setting returns a group of values.

A[A > 11] # returns 12 and 15

A[A >= 11] # returns 11, 12 and 15

a <- A[A >= 11] # object “a” contains 11, 12 and 15

Sub-setting with Boolean operators

# index 1 2 3 4 5 6 7 8 9 10 & AND

B <- A[A > 8 | A < 2 ] ; B # new vector subset of A

A[3] <- 0 # A element 3 IS NOW 0

A <- rep(1:3, times = 4) # repeat sequence

A <- rep(1:3, each = 4) # repeat each element of sequence

• The values (elements) contained in a vector print(V) # abc

• The vector now has the attribute “names”. print(V) # Xyz

• Names can be used for sub-setting. attributes(V) # $names

• The character vector used to add the element

• Attributes are used to create:

Factors - built on integer vectors.

Dates and date-times - built on numeric vectors

Data frames and Tibbles – built on lists.

Vector Coercion print(), typeof(), c(), min(),

the contents of mixed_vector1 to the console.

3: Create a vector named mixed_vector2 containing the following

print(), typeof(), length(), c(),

• Like atomic vectors list elements can be named using a names

stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list

stuff <- list(st1,st2,st3)

st1 <- c(21,32,11) Get names using

stuff <- list("doubles" =st1, "colors"=st2, "integers" =st3) # named list

# Example 4 indexing lists [ [..]]