Professional Documents
Culture Documents
Bird
7/14/2021
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
1
Let’s get to some coding though.
This will generally follow along with the textbook so you don’t need to have it yourself.
Once we get to more advanced things, the textbook won’t be able to keep up anyway so you still won’t need
it.
#options(width=70)
ruv <- runif(n=20,min=0,max=1)
round(ruv,4)
## [1] 0.7777 0.5198 0.0846 0.2466 0.5978 0.5376 0.0485 0.7557 0.1694 0.4653
## [11] 0.7779 0.3463 0.9613 0.3301 0.4665 0.2034 0.6380 0.8020 0.5239 0.1976
Here, we set a vector, ruv, equal to a random uniform variable via the runif function.
We sample 20 with a minimum possible value of 0 and maximum possible value of 1.
2
R as a simple calculator can be useful from time to time:
(8 * 3) + 12/40 - (7ˆ3) + sqrt(9)
## [1] -315.7
In R Markdown, I only entered my code. echo=TRUE parameter will keep the original code in the PDF.
For homework, you can code directly in R Markdown using code blocks. Keeping echo=TRUE will let me see
your code before your output prints as it normally would in R or R Studio.
3
Text formatting
italic or italic
bold bold
superscript2 and subscript2
Headings
Lists
• Bulleted list item 1
• Item 2
– Item 2a
– Item 2b
1. Numbered list item 1
2. Item 2. The numbers are incremented automatically in the output.
Tables
4
Vectors
x <- 3 # The <- is an assignment function that R uses as a foundation.
y <- c(2,3,4) #c for concatenate, which just combines numbers into a vector.
z <- c(10,20,30,40)
x+y
## [1] 5 6 7
x+z
## [1] 13 23 33 43
y+z
## [1] 8 8 8 8 8
z+c(1,2,3,4,5)
## [1] "logical"
5
Booleans
x <- c(FALSE, TRUE, FALSE)
y <- c(FALSE, TRUE, TRUE)
## [1] "integer"
You can also change variables (as long as it makes sense) with the following commands:
as.logical()
as.numeric()
as.double()
as.complex() #any physicists here?
as.character()
as.list()
This is especially useful with data when you want to manipulate entire variables.
Perhaps the data came to you in an unclean manner..
6
Other Basic Functionality
print(1)
## [1] 1
print("You need to make sure YOU knOW wh4T YOU ARE PRintinG")
## [1] "You need to make sure YOU knOW wh4T YOU ARE PRintinG"
Printing will keep everything identical.
You can also print functions to see how many parameters it takes and other info:
print(exp)
If you’re interested in mathematical computations, better programming languages exist for intense stuff.
7
Random tid-bits
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal
!= not equal
& and
| or
8
Matrices (and Arrays)
In the future, we will aim to surpass base R designations with things like Tibbles.
For now, the basics:
a 2-D array is a matrix in R.
M <- matrix(data=1:24,nrow=4, byrow=TRUE)
This is saying fill a matrix with the numbers 1 through 24, 4 rows, 6 columns, and order it by row not column.
M2 <- matrix(data=c(1,2,3,4,5,6),nrow=3, byrow=FALSE)
is.array(M2)
## [1] TRUE
is.matrix(M2)
## [1] TRUE
Now let’s practice making an actual matrix from scratch..
Given data in any format, you can manipulate it to fit the code you want to produce, i.e. by row or col
preference.
City1 has temperatures on three days of 80,70,75
City2 has temperatures on three days of 55,56,45
City3 has temperatures on three days of 20,22,31
## [1] 3 3
temp.data[2,] #2nd row, all columns
## [1] 45
temp.data[1, ,drop=FALSE] #drop=FALSE will keep this a matrix instead of defaulting into a vector.
9
Data Frames
NumVec <- c(1,2,3,4)
CharVec <- c("a","b","c","d")
LogVec <- c("TRUE","TRUE","TRUE","FALSE")
df <- data.frame(NumVec,CharVec,LogVec)
df
## # A tibble: 4 x 3
## NumVec CharVec LogVec
## <dbl> <chr> <chr>
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
## 4 4 d FALSE
In this course, we will usually be skipping the intro stuff in favor of Tidyverse options like Tibbles.
You can load data directly into R if the data set is supported, such as:
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
10
Tidyverse %>% Pipes
You can read the rest of the chapter if you’d like, but most of your data analysis will be done in Tidyverse.
Let’s check out the basics that will get you on your way..
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
11
Let’s see some example code:
iris_df %>%
group_by(Species) %>%
summarize(m = mean(Sepal.Length)) %>%
ungroup()
## # A tibble: 3 x 2
## Species m
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
First, %>% is called a Pipe. It is a Tidyverse shortcut that allows for easy processes to occur in intuitive
order.
In the above code, group_by() lets us take a specific variable and group it by each different level.
Then, we %>% to summarize, a function that gives convenient statistics, in this case, the Mean of a different
variable.
Last, we ungroup() to get back to our original Tibble (or specialized data frame).
12
Mutate
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
iris_df_v2
## # A tibble: 150 x 7
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species pl2 four_sl
## <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
## 1 5.1 3.5 1.4 0.2 setosa 1.96 20.4
## 2 4.9 3 1.4 0.2 setosa 1.96 19.6
## 3 4.7 3.2 1.3 0.2 setosa 1.69 18.8
## 4 4.6 3.1 1.5 0.2 setosa 2.25 18.4
## 5 5 3.6 1.4 0.2 setosa 1.96 20
## 6 5.4 3.9 1.7 0.4 setosa 2.89 21.6
## 7 4.6 3.4 1.4 0.3 setosa 1.96 18.4
## 8 5 3.4 1.5 0.2 setosa 2.25 20
## 9 4.4 2.9 1.4 0.2 setosa 1.96 17.6
## 10 4.9 3.1 1.5 0.1 setosa 2.25 19.6
## # ... with 140 more rows
13
Summarize
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
## # A tibble: 1 x 1
## avg.sl
## <dbl>
## 1 5.84
iris_df %>%
summarize(sd.sl = sd(Sepal.Length))
## # A tibble: 1 x 1
## sd.sl
## <dbl>
## 1 0.828
14
Filter
Choose specific ROWS
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
iris_df_setosa_only
## # A tibble: 50 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 40 more rows
15
Select
Choose specific COLUMNS
iris_df <- as_tibble(iris)
## Rows: 150
## Columns: 2
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.~
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.~
iris_df %>% select(-Sepal.Length, -Sepal.Width) %>% glimpse()
## Rows: 150
## Columns: 3
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.~
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.~
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~
16
Arrange
Use this to arrange your data within a variable
iris_df <- as_tibble(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 4.3, 4.4, 4.4, 4.4, 4.5, 4.6, 4.6, 4.6, 4.6, 4.7, 4.7, 4.~
## $ Sepal.Width <dbl> 3.0, 2.9, 3.0, 3.2, 2.3, 3.1, 3.4, 3.6, 3.2, 3.2, 3.2, 3.~
## $ Petal.Length <dbl> 1.1, 1.4, 1.3, 1.3, 1.3, 1.5, 1.4, 1.0, 1.4, 1.3, 1.6, 1.~
## $ Petal.Width <dbl> 0.1, 0.2, 0.2, 0.2, 0.3, 0.2, 0.3, 0.2, 0.2, 0.2, 0.2, 0.~
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~
17