Professional Documents
Culture Documents
R software(3.6.0 version): https://cran.r-project.org/bin/windows/base/
R Studio: https://www.rstudio.com/products/rstudio/download/
Scope
The Basics
(Arithmetic in R, Assignment & Variables, Data Type Exploration)
Data Frames
(Data frame with structure, accessing and subsetting data frames, PV of cashflows data working)
Factors
(Create a factor, Factor levels, Strings as factor, Bucketing a numeric variable into a factor)
Lists
(Create a list, Named lists, Removing from a list, Split it, Split-Apply-Combine)
The Basics
Arithmetic in R
# Addition
3+5
## [1] 8
# Subtraction
6-4
## [1] 2
# Multiplication
3*4
## [1] 12
The Basics
# Division
4/2
## [1] 2
# Exponentiation
2^4
## [1] 16
# Modulo (# The modulo returns the remainder of the division of the number to the left by the number on the
right.)
7 %% 3
## [1] 1
The Basics
# Assignment and variables
You use <- to assign a variable
Ex.1
Ex.2
# Assign 100 to my_money
my_money <- 100
# Print post_jan_cash
post_jan_cash
## [1] 210
# Print post_jan_cash_10
post_jan_cash_10
## [1] 220
# Multipliers
jan_mult <- 1 + 4 / 100
feb_mult <- 1 + 5 / 100
# Print total_cash
total_cash
## [1] 218.4
The Basics
# Data Type Exploration
Numerics are decimal numbers like 4.5. A special type of numeric is an integer, which is a numeric without a
decimal piece. Integers must be specified like 4L
Logicals are the boolean values TRUE and FALSE. Capital letters are important here; true and false are not valid
# Print my_answer
my_answer
## [1] TRUE
The Basics
# What’s that data type?
A way to find what data type a variable is: class(my_var)
a <- TRUE
class(a)
## [1] "logical"
b <- 5.5
class(b)
## [1] "numeric"
# A logical vector
logic <- c(TRUE, FALSE, TRUE)
# Coerce it
A vector can only be composed of one data type.
This means that you cannot have both a numeric and a character in the same vector.
If you attempt to do this, the lower ranking type will be coerced into the higher ranking type.
Vectors and Matrices
For example: c(1.5, “hello”) results in c(“1.5”, “hello”) where the numeric 1.5 has been coerced into the
character data type.
Logicals are coerced a bit differently depending on what the highest data type is. c(TRUE, 1.5) will return c(1,
1.5) where TRUE is coerced to the numeric 1 (FALSE would be converted to a 0).
Weighted average
The weighted average allows you to calculate your portfolio return over a time period. Consider the following
example:
Assume you have 20% of your cash in Microsoft stock, and 80% of your cash in Sony stock. If, in January,
Microsoft earned 5% and Sony earned 7%, what was your total portfolio return?
Vectors and Matrices
# Portfolio return
portf_ret <- micr_ret * micr_weight + sony_ret * sony_weight
R does arithmetic with vectors! Take advantage of this fact to calculate the portfolio return more efficiently.
# Weights, returns, and company names
ret <- c(7, 9)
weight <- c(.2, .8)
companies <- c("Microsoft", "Sony")
Vectors and Matrices
Assign company names to your vectors
names(ret) <- companies
names(weight) <- companies
# Print ret_X_weight
ret_X_weight
## Microsoft Sony
## 1.4 7.2
# Print portf_ret
portf_ret
## [1] 8.6
Vectors and Matrices
Vector Subsetting
What if you only wanted the first month of returns from the vector of 12 months of returns?
# Define ret
ret <- c(5, 2, 3, 7, 8, 3, 5, 9, 1, 4, 6, 3)
names(ret) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
ret
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 5 2 3 7 8 3 5 9 1 4 6 3
Vectors and Matrices
# # First 6 months of returns
ret[1:6]
ret[-1]
## Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 2 3 7 8 3 5 9 1 4 6 3
Vectors and Matrices
# Create a Matrix
Matrices are similar to vectors, except they are in 2 dimensions!
# A vector of 9 numbers
my_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
# 3x3 matrix
my_matrix <- matrix(data = my_vector, nrow = 3, ncol = 3)
# Print my_matrix
my_matrix
# Define vectors
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03, 112.12, 113.95, 113.30, 115.19, 115.19, 115.82, 115.97, 116.64,
116.95, 117.06, 116.29, 116.52, 117.26, 116.76, 116.73, 115.82)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79, 165.36, 166.52, 165.50, 168.29, 168.51, 168.02, 166.73, 166.68,
167.60, 167.33, 167.06, 166.71, 167.14, 166.19, 166.60, 165.99)
micr <- c(59.20, 59.25, 60.22, 59.95, 61.37, 61.01, 61.97, 62.17, 62.98, 62.68, 62.58, 62.30, 63.62, 63.54, 63.54, 63.55,
63.24, 63.28, 62.99, 62.90, 62.14)
# Print cbind_stocks
cbind_stocks
Vectors and Matrices
# rbind the vectors together
rbind_stocks <- rbind(apple, ibm, micr)
# Print rbind_stocks
rbind_stocks
Correlation is a measure of association between two things, here, stock prices, and is represented by a number from -1
to 1.
• 1 represents perfect positive correlation,
• -1 represents perfect negative correlation, and
• 0 means that the stocks move independently of each other.
The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given
a matrix.
# stock matrix
stocks <- cbind(apple, micr, ibm)
• To select the first row and first column of stocks from the last example: stocks[1,1]
• To select the entire first row, leave the col empty: stocks[1, ]
• To select the first two rows: stocks[1:2, ] or stocks[c(1,2), ]
• To select an entire column, leave the row empty: stocks[, 1] or stocks[, "apple"]
# Third row
stocks[3, ]
# Variables
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)
# Data frame
cash <- data.frame(company, cash_flow, year)
# Print cash
cash
Data Frames
Making head()s and tail()s of your data with some str()ucture
# Call str()
str(cash)
Data Frames
Naming your columns / rows
Change your column names with the colnames() function and row names with the rownames() function.
What if you are only interested in the cash flows from company A? For more flexibility, try
subset(cash, company == "A")
• The first argument is the name of your data frame, cash.
• You shouldn’t put company in quotes!
• The == is the equality operator. It tests to find where two things are equal, and returns a logical vector.
# Restore cash
company <- c("A", "A", "A", "B", "B", "B", "B")
cash_flow <- c(1000, 4000, 550, 1500, 1100, 750, 6000)
year <- c(1, 3, 4, 1, 2, 4, 5)
Data Frames
cash <- data.frame(company, cash_flow, year)
# Restore cash
cash$quarter_cash <- NULL
cash$double_year <- NULL
# Company B information
cash_B <- subset(cash, company == "B")
cash_B
Create a factor
Create a factor by using the factor() function
# credit_rating character vector
credit_rating <- c("BB", "AAA", "AA", "CCC", "AA", "AAA", "B", "BB")
# Print credit_factor
credit_factor
Factors
Factor summary
Present a table of the counts of each bond credit rating by using the summary() function.
# Restore credit_factor
levels(credit_factor)
# Define AAA_rank.
AAA_rank <- c(31, 48, 100, 53, 85, 73, 62, 74, 42, 38, 97, 61, 48, 86, 44, 9, 43, 18, 62, 38, 23, 37, 54, 80, 78, 93, 47, 100,
22, 22, 18, 26, 81, 17, 98, 4, 83, 5, 6, 52, 29, 44, 50, 2, 25, 19, 15, 42, 30, 27)
# Print AAA_factor
AAA_factor
# Plot AAA_factor
plot(AAA_factor)
Factors
Create an ordered factor
To order your factor, there are two options.
# Plot credit_factor_ordered
plot(credit_factor_ordered)
Factors
Subsetting a factor
Removing AAA from credit_factor doesn’t remove the AAA level. To remove the AAA level entirely, add drop =
TRUE
# Define credit_factor
credit_factor <- factor(c("AAA", "AA", "A", "BBB", "AA", "BBB", "A"), ordered = TRUE, levels = c("BBB", "A", "AA",
"AAA"))
# Plot keep_level
plot(keep_level)
# Plot drop_level
plot(drop_level)
Factors
stringsAsFactors
R’s default behavior when creating data frames is to convert all characters into factors. You can turn off this behavior by
adding stringsAsFactors = FALSE
# Variables
credit_rating <- c("AAA", "A", "BB")
bond_owners <- c("Dan", "Tom", "Joe")
# Create a list
portfolio <- list(name, apple, ibm, cor_matrix)
# Print portfolio
portfolio
Adding to a list
Add new elements to an exiting list by using existingList$newElement or c(existingList, newElement)
# Print portfolio
portfolio
# Define portfolio
portfolio_name <- "Apple and IBM"
apple <- c(109.49, 109.90, 109.11, 109.95, 111.03)
ibm <- c(159.82, 160.02, 159.84, 160.35, 164.79)
microsoft <- c(150.0, 152.0, 154.0, 154.5)
portfolio <- list(portfolio_name = portfolio_name, apple = apple, ibm = ibm, microsoft = microsoft, correlation =
correlation)
Split it
Split a dataframe by group using split(). And get your original data frame back by using unsplit().
# Define cash
cash$present_value <- NULL
str(split_cash)
Lists
# # Unsplit split_cash to get the original data back.
original_cash <- unsplit(split_cash, grouping)
# Print original_cash
cash
Split-Apply-Combine
A common data science problem is to split your data frame by a grouping, apply some transformation to each
group, and then recombine those pieces back into one data frame. This is such a common class of problems in R
that it has been given the name split-apply-combine
# Print split_cash
split_cash
# Print cash_no_A
cash_no_A
Attributes
Return a list of attributes about the object you pass in by using attributes(). Access a specific attribute by
using attr()
# attributes of my_matrix
attributes(my_matrix)
# attributes of my_factor
attributes(my_factor)