Professional Documents
Culture Documents
Tepmony Sim
November 2018
1 Introduction to R
4 Programming Basics
1 Introduction to R
4 Programming Basics
?sqrt
?paste
?sum
This can also be done by using function help(functionname):
help(sqrt)
help(paste)
help(sum)
Many questions were already posted on the internet. So, if you pose your
problems properly, you can get the wanted answers. See Stack Exchange
Tepmony Sim Statistics With R ITC 7 / 54
Installing Package
install.packages("ggplot2")
library("ggplot2")
1 Introduction to R
4 Programming Basics
Integer
Example: 2L, -1L, 4L, etc.
Complex
Example: 2-3i, 1+0i, sqrt(3)+1i, etc.
Logical
Two values: TRUE (or T) and FALSE (or F). NA is also considered as logical.
Character
Example: "A", "Sovann", "2", "Gender", etc.
Tepmony Sim Statistics With R ITC 10 / 54
Data Structures
what do you expect to see? If you type c(3, TRUE), what will the result
be?
The Operation “:” for Sequence
(y <- 1:100)
z <- 1:100
t <- c(1:100)
If you enter y, z and t, what will you see? Note that in R, there is no
scalar. They are simply single-component vectors.
2
[1] 2
Tepmony Sim Statistics With R ITC 13 / 54
Vectors
Other forms of vectors in R:
seq(from = 1.5, to = 4.2, by = 0.1)
seq(1.5, 4.2, 0.1)
rep("A", times = 10)
rep(0, 10)
Subsetting
x[argument] = to obtain some components or subset of vector x.
x <- c(1, 3, 5, 7, 8)
x
x[3]
x[2:4]
x[-2]
x[c(1, 3, 4)]
z<-c(TRUE, TRUE, FALSE, TRUE, FALSE)
z
x[z]
Tepmony Sim Statistics With R ITC 15 / 54
Vectorization
Logical Operators
Examples
x <- c(1, 3, 5, 7, 8)
x
x > 3
x < 3
x == 3
x != 3
x == 3 & x != 3
x == 3 | x != 3
TODO: Coercion
sum(x > 3)
as.numeric(x > 3)
which(x > 3)
x[which(x > 3)]
max(x)
which(x == max(x)) (# at which position the value of x is the largest)
which.max(x)
min(x)
which(x == min(x))
which.min(x)
x <- c(1, 3, 5, 7, 8, 9)
y <- 1:100
x + 2
x + rep(2, 6)
x > 3
x > rep(3, 6)
x + y
length(x)
length(y)
length(y)/length(x)
(x + y) - y
y <- 1:60
x + y
length(y)/length(x)
rep(x, 10) + y
all(x + y == rep(x, 10) + y)
identical(x + y, rep(x, 10) + y) (# same objects or not?)
any(x + y != rep(x, 10) + y)
?all.equal
Addition
X + Y
Subtraction
X - Y
Scalar Multiplication
(-3) * X
Tepmony Sim Statistics With R ITC 25 / 54
Matrices
Entrywise Multiplication
X * Y
Entrywise Division
X / Y
Matrix Multiplication
X %*% Y
Matrix Transpose
t(X)
Tepmony Sim Statistics With R ITC 26 / 54
Matrices
Notice that all the elements of the list are of different types.
Unlike a matrix, data frame is not required to have the same data
type of each element.
It is a list of vectors.
Each of its vector must contain the same data type, but the different
vectors can store different data types.
Unlike a list, the elements of a data frame must be all vectors, and
have the same length.
Tepmony Sim Statistics With R ITC 32 / 54
Data Frames
We can also import data in from various file types into R, as well as data
stored in packages.
Example: Importing Data from Other Sources
The example data above can also be found here as a .csv file. To read this
data into R, we use read csv() function from the readr package. Note
that R has a build-in function read.csv() that operates very similarly. The
function read csv() has a number of advantages over its counterpart
read.csv(). For large dataset, read csv() reads much faster than
read.csv(). In addition, it also use the tribble package to read the data
as a tribble.
library(readr)
example data from csv <- read csv("./Dropbox/Applied
Statistics/Data/example-data.csv")
A tibble is simply a data frame that prints with rational behavior. Notice
in the output above that we are given additional information such as
dimension and variable type.
The as.tibble() function can be used to coerce a regular data frame to a
tibble.
library(tibble)
example data <- as.tibble(example data)
example data
library("ggplot2")
head(mpg, n = 10)
Tepmony Sim Statistics With R ITC 36 / 54
Data Frames
The function head() will display the first n observations of the data frame.
The head() function was more useful before tibbles. Notice that mpg is a
tibble already, so the output from head() indicates there are only 10
observations. Note that this applies to head(mpg, n = 10) and not mpg
itself. Also note that tibbles print a limited number of rows and columns
by default. The last line of the printed output indicates with rows and
columns were omitted.
mpg
The function str() will display the ”structure” of the data frame. It will
display the number of observations and variables, list the variables, give
the type of each variable, and show some elements of each variable. This
information can also be found in the ”Environment” window in RStudio.
str(mpg)
names(mpg)
mpg$year
Tepmony Sim Statistics With R ITC 38 / 54
Data Frames
We can use the dim(), nrow() and ncol() functions to obtain information
about the dimension of the data frame.
dim(mpg)
nrow(mpg) # enter the sample size
ncol(mpg) # enter the number of variables
Subsetting data frames can work much like subsetting matrices using
square brackets, [,]. Here, we find fuel efficient vehicles earning over 35
miles per gallon and only display manufacturer, model and year.
Lastly, the same result can be obtained by using the filter and select
functions from the dplyr package which introduces the %>%. operator from
the magrittr package.
library("dplyr")
mpg %>% filter(hwy > 35) %>% select(manufacturer, model,
year)
1 Introduction to R
4 Programming Basics
Example:
women
# save a single object to file
saveRDS(women, "women.rds")
# restore it under a different name
women2 <- readRDS("women.rds")
women2
identical(women, women2)
Tepmony Sim Statistics With R ITC 42 / 54
Save Multiple Objects to a File
The function save() can be used to save one or more R objects to a
specified file (in .RData or .rda file formats). The function can be read
back from the file using the function load(). We use the following syntax:
# Saving on object in RData format
save(data1, file = "data.RData")
# Save multiple objects
save(data1, data2, file = "data.RData")
# To load the data again
load("data.RData")
Note: if you save your data with save(), it cannot be restored under
different name. The original object names are automatically used.
Example:
data1<-c(1,2,3)
data2<-c(2,3) save(data1, data2, file = "data.RData")
load("data.RData")
Tepmony Sim Statistics With R ITC 43 / 54
Saving the Entire Workspace
It is a good idea to save your workspace image when your work sessions
are long. This can be done at any time using the function save.image()
We use the following syntax:
save.image()
That stores your workspace to a file named .RData by default. This will
ensure you do not lose all your work in the event of system reboot, for
instance.
When you close R/RStudio, it asks if you want to save your workspace. If
you say yes, the next time you start R that workspace will be loaded. That
saved file will be named .RData as well.
It is also possible to specify the file name for saving your work space:
1 Introduction to R
4 Programming Basics
if (...) {
some R code
} else {
more R code
}
For example:
x <- 1
y <- 3
if (x > y) { z = x * y
print("x is larger than y")
} else { z = x + 5 * y
print("x is less than or equal to y")
}
z
Tepmony Sim Statistics With R ITC 46 / 54
Control Flow
R also has a special function ifelse() which is very useful. It returns one of
two specified values based on a conditional statement.
ifelse(4 > 3, 1, 0)
The real power of ifelse() comes from its ability to be applied to vectors.
x = 11:15
for (i in 1:5) {
x[i] = x[i] * 2
}
x
Note that this for loop is very normal in many programming languages,
but not in R. In R we would prefer not use a loop, instead we would
simply use a vectorized operation.
x <- 11:15
x <- x * 2
x
Now, look at a number of ways that we could run this function to perform
the operation 102 resulting in100. Now consider a much more succinct
way of writing:
power of num(10)
power of num(10, 2)
power of num(num = 10, power = 2)
power of num(power = 2, num = 10)
It will also have the ability to return the biased estimate (based on
maximum likelihood) which we will call σ̂ 2 :
n
2 1X
σ̂ = (xi − x̄)2 .
n
i=1